Open theelderbeever opened 1 year ago
It appears to have to do with the {send,recv}.lock
files not being cleaned up. Do I need to implement SIGTERM handling to clean those?
@theelderbeever I believe calling recover
at application startup is the intended use. @tokahuke what do you think?
@barafael Okay interesting... wouldn't that make it somewhat not robust against starting multiple of the same application simultaneously? By calling recover
each time I would assume that completely disregards the purpose of the lock files to begin with.
FWIW we pivoted away from that project so I didn't dig any further.
Yes, that is the intended purpose of recovery. No, it doesn't negate the lock files, because it also looks at the processes in the OS. Each lock file has the pid of the process that has locked the queue. In case the process is still alive, unlocking fails. So, if you spawn multiple applications, locking will not be violated. See [the code] (https://github.com/tokahuke/yaque/blob/140d889a2f4c5f8b4c05a8f3a06a7f8a5cdb5adb/src/recovery.rs#L41) for more information.
Edit: this is perhaps something that I should mention in the docs.
@tokahuke Awesome! That is a great explainer. Something exactly like that in the docs would be super helpful!
In the meantime I think if we leave this issue open it has a better chance of showing up in search results.
So, I am drafting a new PR, which will lead to a new minor release. I will try to include it in there.
I am running a yaque channel with a gRPC server as a sender and a worker on the other other end processing messages. If I start the application and send a few messages then hit Ctrl+C and try and restart the application I receive a "sender side already in use" error when creating the channel on the same directory. The only way to fix this is to delete the directory and create a fresh one. This makes the queue non-durable and prone to large amounts of data loss on crashes.
Any suggestions?