I'm not currently sure what the cause(s) of these are, as nothing has changed recently in this code, but this PR updates session restore to write the file atomically, which should mean that we can always re-launch with a valid restore file even if something goes wrong during the write. I also added some metrics collection for write failures and restoration failures to try to understand if this is a widespread issue.
It's possible that this is slower than a standard write, which mainly impacts shutdown times, as we write the file synchronously before exiting. In my very limited experimentation it doesn't seem to be too bad though.
If this resolves the issue, we could also extend this approach to the settings file, about which we've also had similar reports. That data is less critical though, so the performance vs reliability tradeoff may be different there.
Recently, there's been a couple reports of Min losing session restore data unexpectedly (#2519, #2527, also #2503 and https://discord.com/channels/764269005195968512/764544014259060797/1305916870473547776 which may or may not be related).
I'm not currently sure what the cause(s) of these are, as nothing has changed recently in this code, but this PR updates session restore to write the file atomically, which should mean that we can always re-launch with a valid restore file even if something goes wrong during the write. I also added some metrics collection for write failures and restoration failures to try to understand if this is a widespread issue.
It's possible that this is slower than a standard write, which mainly impacts shutdown times, as we write the file synchronously before exiting. In my very limited experimentation it doesn't seem to be too bad though.
If this resolves the issue, we could also extend this approach to the settings file, about which we've also had similar reports. That data is less critical though, so the performance vs reliability tradeoff may be different there.