Closed JackNewman12 closed 1 year ago
Very interesting! I guess this came up more often when I did the latest reboot optimizations for shutdown?
I'm obviously not fond of sprinkling fragile sleep(1);
statements on cases like this, but I really appreciate the comment you added. Is there perhaps a way to add a condition that checks for what filesystems you have mounted? E.g., if ("/" == (ubiffs|jffs|jffs2|ext3|ext4)) sleep(1);
Also, is there really no other kernel mechanism to know when such threads are done?
I need to discuss this with a colleague before merging, extending the shutdown/reboot with a second could affect some use-cases.
OK, I've discussed this now with my colleague. He agrees the sleep is very fragile, and we should also not try to maintain some list of filesystems, with respective recommended delays.
It's very frustrating we cannot trust sync, but that's a known entity for some time now. However, that we cannot even trust remounting as read-only to not return until it has completed. That's just nasty :disappointed:
Our proposal is to add a configure option for this, making it up to the user. (Default: off.) What do you think about that?
Would the user have to configure the sleep amount? Is it possible to require a different sleep amount based on the hardware?
Just as a test I added a shutdown hook script that checks and waits until the ubifs threads have exited via ps | grep ...
. Not sure if that is more or less nasty than throwing in an arbitrary sleep()
and praying they have finished. 🤒
Would the user have to configure the sleep amount? Is it possible to require a different sleep amount based on the hardware?
Hmm, good point. Perhaps best to make it a .conf setting instead of configure. Something like this:
reboot-delay <SEC>
Delay in seconds, after unmounting and before asking the kernel to reboot/poweroff the system. This to allow filesystem threads to complete, e.g., wear leveling in UbiFS.
Default: none (disabled)Just as a test I added a shutdown hook script that checks and waits until the ubifs threads have exited via
ps | grep ...
. Not sure if that is more or less nasty than throwing in an arbitrarysleep()
and praying they have finished. :face_with_thermometer:
None of this is ideal. Having it configurable, and thus a visible thing, in Finit is probably best. A user reading the documentation would then be made aware of this (kernel) issue.
Only thing I'm not yet sure of is what the best default is ... I've put in none
above just to restart the discussion. It's not at all clear to me if there is safe value, maybe like 10 seconds? But would that be sufficient on a system with reaaaalllly slow CPU and a big JFFS2 file system? I don't know, but maybe a very long delay by default could lead to more people reading the documentation?
This gives time for various kernel threads to finish any pending work. In particular this fixes a kernel panic in the ubifs wear-leveling that happens fairly often for me. It looks like after an umount ubifs is still performing some cleanup, or hasn't yet stopped all of its workers.
You could probably just add a hook script with this one second delay in it, but looking at the sync(8) manpage I think it is appropriate for this sleep to be finits responsibility: