Open tomjridge opened 2 years ago
@tomjridge any news on that ?
I think @icristescu interacted with them. I don't think I saw anything further on the slack channels I'm signed up to. I guess if it is a problem they will raise an issue on tezos issues. So perhaps we can close this for now.
After checking the latest tezos issues, this may be related: https://gitlab.com/tezos/tezos/-/issues/3249
There is an initial report by @vishakh on tezos slack #general on Jun 2. This does not include the "Kind.of_magic" log line. In the thread conversation Alexander Eichhorn posts a similar report about a node "stuck for several hours" which does contain the "Kind.of_magic" error. That is the message I copied in the initial issue description above. User vishakh then opened tezos issue 3249.
Just a quick comment: the Kind.of_magic error fails because it reads a "0" byte, and this does not correspond to any of the object tags we use. One way this can happen is if there is a crash whilst appending data. A weak model of writing to the end of a file is that the file is first extended to the correct length, and filled with 0 bytes. Then the actual data is written over the 0 bytes. If there is a crash between the point the file is extended, and the point the data actually makes it to disk, there can be 0 bytes at the end of the file.
This model is described here: https://tom-ridge.blogspot.com/2022/07/a-model-of-file-update.html
(This isn't just a made up ad-hoc model - this is the "standard" mental model people use I believe when working with files.)
Of course, it could also be due to some other reason, e.g. some bad code writes a zero byte at the wrong offset for example.
https://app.slack.com/client/T59LZHQ11/C596FGNUR/thread/C596FGNUR-1654186374.376829
From that link:
Looks like we also have one v13.0 node with a strange error that I haven't seen before:
The node was stuck for several hours. The problem did not go away after a simple restart. I then deleted peers.json and even the identity file, but after reconnect to one of our other nodes the problem did not go away.