sagemathinc / cocalc

CoCalc: Collaborative Calculation in the Cloud
https://CoCalc.com
Other
1.16k stars 216 forks source link

[test] opening md file: "blob is gone" #4361

Open haraldschilly opened 4 years ago

haraldschilly commented 4 years ago

while testing in namespace "test", I was unable to open a markdown file. inspecting the project log, it says:

failed to connect -- blob is gone; will retry

the file is in the files listing, 3 month old, 0 bytes.

I don't know how severe this is, I just want to make an issue. Other files open fine.

2020-02-04T12:23:23.569Z - debug: Client.get_hub_socket: there are 1 sockets -- ["dc8c71fd-95cd-485e-88a4-a47bed8282f6"]                                                                                                                                                        
2020-02-04T12:23:23.569Z - debug: Client.call(message={"id":"5f141526-446c-4d18-ae14-ef6857fcdc21","query":{"syncstrings":[{"string_id":"a48f758c9e65e1e0f3d6bacccc3a3afcbc6499bf","project_id":"c37fbd83-c4c3-4f92-b66c-37b8d2c8cdf1","path":"iframe-comm-test.md","deleted":nu
ll,"users":null,"last_snapshot":null,"snapshot_interval":null,"save":null,"last_active":null,"init":null,"read_only":null,"last_file_change":null,"doctype":null,"archived":null,"settings":null}]},"changes":false,"multi_response":false,"options":[],"event":"query"}):
2020-02-04T12:23:23.569Z - debug: Client.call(message={"id":"5f141526-446c-4d18-ae14-ef6857fcdc21","query":{"syncstrings":[{"string_id":"a48f758c9e65e1e0f3d6bacccc3a3afcbc6499bf","project_id":"c37fbd83-c4c3-4f92-b66c-37b8d2c8cdf1","path":"iframe-comm-test.md","deleted":nu
ll,"users":null,"last_snapshot":null,"snapshot_interval":null,"save":null,"last_active":null,"init":null,"read_only":null,"last_file_change":null,"doctype":null,"archived":null,"settings":null}]},"changes":false,"multi_response":false,"options":[],"event":"query"}): confi
gure timeout                                                                 
2020-02-04T12:23:23.615Z - debug: Client.handle_mesg({"event":"error","error":"blob is gone","id":"5f141526-446c-4d18-ae14-ef6857fcdc21"}): calling callback
syncstrings -- failed to connect -- blob is gone; will retry
2020-02-04T12:23:25.473Z - debug: Client.get_hub_socket: there are 1 sockets -- ["dc8c71fd-95cd-485e-88a4-a47bed8282f6"]
2020-02-04T12:23:25.473Z - debug: Client.call(message={"id":"d87901fd-a49d-4f1a-9417-5314e5a7aee2","query":{"syncstrings":[{"string_id":"a48f758c9e65e1e0f3d6bacccc3a3afcbc6499bf","project_id":"c37fbd83-c4c3-4f92-b66c-37b8d2c8cdf1","path":"iframe-comm-test.md","deleted":nu
ll,"users":null,"last_snapshot":null,"snapshot_interval":null,"save":null,"last_active":null,"init":null,"read_only":null,"last_file_change":null,"doctype":null,"archived":null,"settings":null}]},"changes":false,"multi_response":false,"options":[],"event":"query"}):
2020-02-04T12:23:25.473Z - debug: Client.call(message={"id":"d87901fd-a49d-4f1a-9417-5314e5a7aee2","query":{"syncstrings":[{"string_id":"a48f758c9e65e1e0f3d6bacccc3a3afcbc6499bf","project_id":"c37fbd83-c4c3-4f92-b66c-37b8d2c8cdf1","path":"iframe-comm-test.md","deleted":nu
ll,"users":null,"last_snapshot":null,"snapshot_interval":null,"save":null,"last_active":null,"init":null,"read_only":null,"last_file_change":null,"doctype":null,"archived":null,"settings":null}]},"changes":false,"multi_response":false,"options":[],"event":"query"}): confi
gure timeout                                                                                                                                                                                                                                                                    
2020-02-04T12:23:25.519Z - debug: Client.handle_mesg({"event":"error","error":"blob is gone","id":"d87901fd-a49d-4f1a-9417-5314e5a7aee2"}): calling callback
syncstrings -- failed to connect -- blob is gone; will retry
2020-02-04T12:23:26.774Z - debug: Client.MonitorPublicPaths.update_loop: successful update
2020-02-04T12:23:27.933Z - debug: Client.get_hub_socket: there are 1 sockets -- ["dc8c71fd-95cd-485e-88a4-a47bed8282f6"]
2020-02-04T12:23:27.934Z - debug: Client.call(message={"id":"59fecbb1-0059-4ffe-9364-1bb1cd6c4474","query":{"syncstrings":[{"string_id":"a48f758c9e65e1e0f3d6bacccc3a3afcbc6499bf","project_id":"c37fbd83-c4c3-4f92-b66c-37b8d2c8cdf1","path":"iframe-comm-test.md","deleted":nu
ll,"users":null,"last_snapshot":null,"snapshot_interval":null,"save":null,"last_active":null,"init":null,"read_only":null,"last_file_change":null,"doctype":null,"archived":null,"settings":null}]},"changes":false,"multi_response":false,"options":[],"event":"query"}):
2020-02-04T12:23:27.934Z - debug: Client.call(message={"id":"59fecbb1-0059-4ffe-9364-1bb1cd6c4474","query":{"syncstrings":[{"string_id":"a48f758c9e65e1e0f3d6bacccc3a3afcbc6499bf","project_id":"c37fbd83-c4c3-4f92-b66c-37b8d2c8cdf1","path":"iframe-comm-test.md","deleted":nu
ll,"users":null,"last_snapshot":null,"snapshot_interval":null,"save":null,"last_active":null,"init":null,"read_only":null,"last_file_change":null,"doctype":null,"archived":null,"settings":null}]},"changes":false,"multi_response":false,"options":[],"event":"query"}): confi
gure timeout                                           
2020-02-04T12:23:27.979Z - debug: Client.handle_mesg({"event":"error","error":"blob is gone","id":"59fecbb1-0059-4ffe-9364-1bb1cd6c4474"}): calling callback
syncstrings -- failed to connect -- blob is gone; will retry    
williamstein commented 4 years ago

I wonder - is this a side effect of maybe having reset or wiped something on the test server?

That said, maybe the right fix is to implement fallback behavior if the blob is really missing (and it's not just an ephemeral issue with gcsfuse!). It would be reasonable to discard the history of the file rather than make it impossible to open.

I'm not aware of ever having hit this problem in production.

WORKAROUND: rename the file, then open the renamed file.

williamstein commented 3 years ago

Log from this happening again:

syncstrings -- failed to connect -- blob is gone; will retry
2021-02-04T10:38:14.184Z - debug: primus-api request object {"cmd":"synctable_channel","query":{"syncstrings":[{"string_id":"f9eaed66b1f2eba7f8fccfc871444d2
8a7c2ffba","project_id":"35542cec-cd3d-4fbc-bb48-4d2373dd5eab","path":".Deutschland.ipynb.sage-jupyter2","users":null,"last_snapshot":null,"snapshot_interva
l":null,"save":null,"last_active":null,"init":null,"read_only":null,"last_file_change":null,"doctype":null,"archived":null,"settings":null}]},"options":[{"p
ersistent":true}]}
syncstrings -- failed to connect -- blob is gone; will retry
syncstrings -- failed to connect -- blob is gone; will retry
syncstrings -- failed to connect -- blob is gone; will retry
2021-02-04T10:38:25.878Z - debug: primus-api request object {"cmd":"synctable_channel","query":{"syncstrings":[{"string_id":"f9eaed66b1f2eba7f8fccfc871444d2
8a7c2ffba","project_id":"35542cec-cd3d-4fbc-bb48-4d2373dd5eab","path":".Deutschland.ipynb.sage-jupyter2","users":null,"last_snapshot":null,"snapshot_interva
l":null,"save":null,"last_active":null,"init":null,"read_only":null,"last_file_change":null,"doctype":null,"archived":null,"settings":null}]},"options":[{"p
ersistent":true}]}
syncstrings -- failed to connect -- blob is gone; will retry

Obviously, the fix is to make it so opening files is robust against blobs being gone for some reason. The blobs are essentially only needed for longterm backups.

williamstein commented 4 days ago

But still, it would be nice to change our code to allow opening a file if the blob were gone, e.g., in an emergency this would be nice...