puppylinux-woof-CE / woof-CE

woof - the Puppy builder
GNU General Public License v2.0
389 stars 278 forks source link

Puppy shutdown with savefile - is it reliable? #1175

Closed gyrog closed 5 years ago

gyrog commented 6 years ago

Challenge: Boot a recent woof-ce 'testing' Puppy on a Linux partition, and choose a savefile, not a savefolder. Then boot with "pfix=fsckp, fsck" and watch warning messages from fsck during boot. You can also try this with a vfat partition. It's no use doing this with any partition fs type that is not supported by fsck in "initrd.gz".

Is it just me, or do we have an issue?

If you are using a Linux partition, you could then try the same with a savefolder.

gyrog commented 6 years ago

When I do this with a savefile, fsck reports issues with the partition.

wdlkmpx commented 6 years ago

Savefile. Booting from a small ext3 partition i see only a suspicious message: recovering journal.

Savefolder. pfsck doesn't seem to have effect

I remember i made a build with the preliminary overlayfs stuff (rationalise), it had a long sleep in rc.shutdown when something was not unmounted (properly) or something like that. But i don't see it in https://github.com/wdlkmpx/woof-CE/commit/d81384ea6a68af27668a94eff2f3203f2f172cbc

wdlkmpx commented 6 years ago

One dangerous thing that happened on first boot, is that i had the speakers on, and then there was a very loud sound when X started that shocked me, and my brother came to my room.

The story behind this is that one day one dog was chasing one of our cats, and he became mad, and this silly mistake i made almost caused a tragedy..

gyrog commented 6 years ago

You need to boot with "pfix=fsckp,fsck". "fsckp" does fsck on the partitions. "fsck" only does fsck of savefile.

It's the "recovering journal" that suggests to me that the previous "umount" was not successful.

This has nothing to do with overlayfs, the test is for a "normal" Puppy. Besides "overlay_init" does not support savefile.

wdlkmpx commented 6 years ago

I've seen a message before, saying that something went wrong and delaying the shutdown, but i don't see that code in testing

I've also read people discussing reboot/shutdown in the busybox mailing list but i don't recall the details, or was it the init?

wdlkmpx commented 6 years ago

something is certainly wrong. busybox -ar complains /etc/mtab is missing, but it's there, a symlink to /proc/mounts -> self/mounts.

busybox -arn reveals reasources are busy, trying other variants i saw a weird message trying to unmount the partition the savefile is in..

i'll try some experiments

wdlkmpx commented 6 years ago

i'm a little confused.

hmm with a savefolder there are no error msgs after reboot, but i see the same errors when shutting down.

it would seem that stuff mounted by the initrd isn't unmountable. the savefile certainly isn't, i see rc.shutdown remounts it as ro, then unmounts it with -l opt = lazy unmount = fake unmount... without -l it doesn't unmount the savefile or any other file mounted by the initrd.

without the lazy unmount i see the same error in the savefile though.. recovering journal. if i apply the lazy unmount to everything the shutdown appears to be clean.. and it's not.

but why does it work with savefolders? does it work in old stuff?

searching in google i found this http://www.slax.org/es/blog/18438-initramfs-pivot-root-solution.html

wdlkmpx commented 6 years ago

After a few more tests this is my report:

savefolder: ok, unmounts pup_rw*, remounts partition as ro, tmpfs is busy

savefile: partially-ok, unmounts pup_rw*, partition and tmpfs are busy, cannot be remounted as ro**

*lazy umount, fake umount

**persistence is not affected, boots ok.. suspicious msg only visible after adding extra pfsck...

there are a few lines in rc.shutdown that i think are not triggered:

if [ "mount | grep '^tmpfs on /tmp '" != "" ];then #created by /init in initrd.gz ...

i changed them to:

if [ "mount | grep '^tmpfs on /initrd/mnt/tmpfs '" != "" ];then

created by /init in initrd.gz

...

but it doesn't make any difference

Besides that, fuser -m and lsof don't show any process using the savefile partition, so it remains a mystery why is it locked, it must some kind of sorcery

mavrothal commented 6 years ago

In the past the source of this problem was aufs. May worth looking at different kernels. I do see this problem with the 4.9.XX kernels these days, both in recovering journal and fixing a couple on inodes in every boot.

wdlkmpx commented 6 years ago

With savefolders the partition is remounted read only. That is the last of all the steps i see in busybox -ar.

This last step does not succeed with a savefile, it cannot remount the partition.. and basically that's the only difference.

I wonder when that error started happening. As far as i know, pfsck in the old days always triggered a full scan or something. Can't remember.

wdlkmpx commented 6 years ago

The bug happens with kernels 3.4.103, 3.14.79, 4.1.48, 4.9.71

I still can't figure out what is it that causes the bug or a possible fix, but maybe these lines

if [ "`mount | grep '^tmpfs on /tmpfs '`" != "" ];then #created by /init in initrd.gz
  mkdir -p /tmp/unrootfs
  busybox mount -o remount,prepend:/tmp/unrootfs,xino=/tmp/unrootfs/xino -t $uniFS / /

should be reimplemented in some other way?

It's boring when something is not easy to fix

wdlkmpx commented 6 years ago

I believe the current init is better than the old one.

edit: I tested tahrpup 6.0.5 and it cannot remout the pupsave partition read-only. Fails with savefile and savefolder.

edit: In the current init there is also pfsck = partition fsck.

Reading the old init i see a partition fsck is only triggered after improper shutdown.. fsckme.flg.

I expect to see the same errors if i edit the old init to show a fsck on the partition.

So what does this mean? There's nothing to worry about..

I get a less noisy shutdown replacing this:

  #i think only work if prepended dir is a separate f.s...
  if [ "`mount | grep '^tmpfs on /tmp '`" != "" ];then #created by /init in initrd.gz
   mkdir -p /tmp/unrootfs
   busybox mount -o remount,prepend:/tmp/unrootfs,xino=/tmp/unrootfs/xino -t $uniFS / /
   sync
  fi

with this:

if [ "$uniFS" == "aufs" -a "$SAVE_LAYER" == "/pup_rw" ]; then
  busybox mount -o remount,noxino,noplink,clean_plink,ro -t $uniFS / / >/dev/console 2>&1
  sync
fi

Just random params i added, but that apparently causes the tmpfs to unmount cleanly, it probably removes the hidden /initrd/mnt/tmpfs/.aufs.xino or something

And:

busybox umount -ar > /dev/null

with this:

cut -f 2 -d ' ' /proc/mounts | grep '^/initrd/mnt'  | busybox xargs -n 1 busybox umount -vr > /dev/console 2>&1
sync
cut -f 2 -d ' ' /proc/mounts | grep '^/initrd/mnt'  | busybox xargs -n 1 umount-FULL -i -n -l -v > /dev/console 2>&1
sync
busybox umount -arn > /dev/console 2>&1

With a savefolder, you'll se it remounts the parttion read only and then unmounts it (-l)

With a savefile, you'll see it cannot remount the partition read only and then unmounts it (-l) so you get a false sense of security...

wdlkmpx commented 6 years ago

I wrote some wrong info, in the old init: fsck = savefile fsck, in the current init it's the same pfix + fsckp = partition fsck.

I also tested different busybox versions in the initrd and the final fs, different kernel versions, the result is consistent. I think i'll also test precise to see whether the bug was always there.. so overall, i think i'll add the changes i posted above, but maybe without the '>/dev/console'

But the fact that the old inits let you do a partition fsck only after improper shutdown and basically not showing the savefille fsck output, just tells me that worse things happened but it was all hidden..

wdlkmpx commented 6 years ago

100% verified. the bug has always been there ahahaha...

tahrpup 6.0.5 does not shutdown cleanly precise 5.7.1 does not shutdown cleanly

The only way to trigger fsck on partition is through the fsckme.flg file, and then reboots, it does not continue... old inits.

Precise seems to cause more "damage" after the first shutdown where i see 'Setting inode count' or something,

Both precise and the old tahrpup (savefile and savefolder) produce the same results after reboot... recovering journal.

That makes using savefolders with the current init the safest possible option.

Or maybe we should read the slax blog or something, i see that page when i search pivot_root, clean shutdown initrd, including this article https://www.slax.org/cs/blog/24229-Clean-shutdown-with-systemd.html

gyrog commented 6 years ago

I've been away. I think the significant difference between savefile and savefolder is the existence of an "rw" loop device that references a file on the partition when a savefile is used.

gyrog commented 6 years ago

I've just done some testing with tahr 6.0.6, tahr64 6.0.6, xenialpup 7.5, xenialpup64 7.5 and upupbb 18.05. On every boot after a shutdown with an active savefile, fsck has reported "revcovering journal", on the partition containing the savefile. On every boot after a shutdown with an active savefolder, fsck has not reported any warnings on the partition containing the savefile. I have not found any evidence of any actual data corruption.

We have a number of options:

  1. ignore it, as has been done in the past.
  2. fix rc.shutdown, could be difficult.
  3. fix the whole initrd.gz thing so that we switch root back to initrd at shutdown, probably a large project. Once the stack is no longer /, it could be dismantled.
  4. Make "pfix=fsckp" the default, so that at least users are aware of it, and hopefully the fsck will fix any issues, thus avoiding the possability of data corruption.
  5. Suggest to users to use savefolder in preference to savefile.
  6. And the most popular of all, drop support for savefiles all together.
wdlkmpx commented 6 years ago

I think having savefiles in a vfat partition is particularly dangerous, that filesystem is too primitive and prone to data corruption. I have my savefiles in NTFS partitions... (usb drvs), as i access the same drvs from Winthose.

Maybe i should disable the vfat fsck in the initrd, i think it still needs more work before it can be usable in a "production environment".

  1. Some people probably don't need to know about it now, but see 5 and 6.
  2. I don't think it's fixable
  3. This is the solution, i think i'll try to implement something like this in the current init.
  4. See 1
  5. I think shutdownconfig already suggests that, but there are stubborn people
  6. See 5, someone should show the savefile apologists the truth, in a known forum. This does not apply to the truly naive and innocent people who would really prefer the best option..
gyrog commented 6 years ago

Trying to fix rc.shutdown: With the current init we don't have to worry about the ".xino" file it's already in tmpfs. cat /sys/fs/aufs/si_*/xi_path The most we can do is the following:

  1. prepend a directory in /tmp as an "rw" branch at the top of the stack
  2. remount "/", specifying the save_layer directory as an "rr" branch.
  3. remount the loop device for the save_layer as "ro".
  4. remount the partition containing the savefile as "ro" If that doesn't work, then I'm all out of ideas. Note: I haven't coded or tested this idea yet.

@wdlkmpx, good luck with trying to implement the "real" solution. It would be really nice to put this to bed once and for all, but I've no idea as to how much effort this would take.

Making "pfix=fsckp" the default: We could make the output from fsck go to "/tmp/bootinit.log" and not the console. But I just realised, this only helps folk who are using a Linux partition, who could easily switch to using a savefolder. I don't think the fsck of vfat does much to actually fix any errors, but at least when it reports that the "dirty bit" is still set, I know there is a problem. As to ntfs, I only trust my ntfs partitions to Windows. I have more faith in the ext software for Windows being able to read my ext partitions, than having Puppy even write to ntfs let alone fsck them.

wdlkmpx commented 6 years ago

savefiles also support cryptoloop and cryptsetup (dmcrypt). i added cryptsetup support in one day and it did work, i really wasn't expecting it to work. but the static cryptsetup is big, i wasn't able to compile a smaller binary

i wonder if you have managed to implement encryption for savefolders. gocrypt comes to mind

gyrog commented 6 years ago

Encryption: When I last worked with gocrypt, aufs and gocrypt simply did not work together. To my mind "cryptoloop" is a dead technology, and using encrypted savefiles is very limiting. It would be easier to move away from the idea of encrypted savefiles and move to the concept of a "Puppy vault" which could be included in a savefile, savefolder or on "/mnt/home". I already implemented something using gocrypt, but it could also be a real Luks partition, or even a "partition-file" containg a Luks "partition" (similar to a savefile). Not being an actual branch in the stack, it should be able to be simply unmounted at shutdown. It would contain only the users "private" files, and not any puppy software installed with .pet's. If stored outside the stack e.g. in "/mnt/home/..." or a seperate Luks partition, it could be shared between Puppies.

gyrog commented 6 years ago

I did some testing of my shutdown proceedure above, using xenialpup64 7.5.

Prepending a directory in /tmp as an "rw" branch, gives no errors. Remounting of the save_layer as an "ro" branch, always fails, "busy". Remounting of the loop device as "ro", gives no errors. Remounting of the partition as "ro", always fails, "busy". fsck always gives "recovering journal" warning after savefile shutdown. fsck never gives "recovering journal" warning after savefolder shutdown.

I'm stumped.

gyrog commented 6 years ago

Did another test of a frugal install of xenialpup64 7.5 on an ntfs partition, including savefile. After each shutdown I re-booted into windows 10 to test the integrity of the ntfs partition. Windows always said it did not find any errors on the ntfs partition. As per normal, fsck in xenialpup64 never complained about the savefile itself. (Although after a few reboots my windows 10 desktop got mucked up, but that may be unrelated.)

Perhaps the messasge is, "don't use savefile on fat32 or Linux partitions". (I have had fsck.vfat complain about the "dirty bit" still being set when using a savefile.)

However I still would never store a Puppy savefile on any important windows partition, particularly my C: drive.

wdlkmpx commented 6 years ago

Hmm i think cryptoloop support will be removed from the init script one of these months..

Adding a couple of warnings to shutdownconfig is a TODO, that anyone can do..

By the way, this is an updated rationalise branch with selected commits that may or may not be mergeable, the last 4 commits: https://github.com/puppylinux-woof-CE/woof-CE/tree/rationalise

LateAdopter commented 6 years ago

I use multiple savefiles for reasons of compartmentation. They provide some protection against my incompetence or for different levels of security.

All Woof-CE puppies, that I have tried, have this shutdown problem. I first reported it in the Slacko64 Alpha threads in the forum.

Woof 2 puppies don't have the problem and Fatdog doesn't either.

My workaround, for Woof-CE puppies, is to put the savefiles in a F2FS partion that doesn't care about clean shutdowns.

Thanks for your work, but please don't remove the savefile option!

wdlkmpx commented 6 years ago

@LateAdopter precise pup was built with woof2 and it has this problem.

So you must specify what puppy built with woof2 doesnt have this problem, and i'll test it to see if it's true

gyrog commented 6 years ago

@LateAdopter, and why can't you have multiple savefolders? And is that F2FS doesn't care about clean shutdowns, or is it that Puppy can't tell you about any problems with an F2FS partition?

wdlkmpx commented 6 years ago

In my old test machine: ( = after reboot)

woof2 lupu-525.iso + savefile = recovering journal precise-5.7.1.iso + savefile = recovering journal racy-5.5.iso + savefile = recovering journal slacko-5.4-opera-4g.iso + savefile = recovering journal wacy-5.5.iso + savefile = recovering journal

woofce pups + savefile = recovering journal

I haven't tested a puppy4 yet.

I need savefiles for NTFS partitions, so i think they're not going away... but it's not advised to use savefiles in FAT32/EXT2-4 partitions. People use savefiles with fat32 partitions and i read some people complaining that a savefile became corrupted or a 0-size file in a fat32 partition.. that will happen sooner or later, specially after 'improper' shutdowns.

However now that savefiles support dm-crypt (luks) encrytion, i think i'll remove the old crytoloop method, from the initrd and shutdownconfig... you will only be able to mount those old encrypted savefiles... that's my plan, but sometimes i change my mind when i'm faced with opposition... because i had already removed cryptoloop support before (and before i implemented dm-crypt support)...

I don't know about fatdog, but if it uses the same logic as puppy, then the bug is out there (most likely).

A fix that will probably work is to identify what old slax builds do.. is that the porteus boot? Never used it.

Well i think i read in a SLAX page that by cloning and somehow preserving the initrd and pivoting into it at shutdown, it's technically possible to unmount stuff that otherwise would not be possible..

wdlkmpx commented 6 years ago

Talking about opposition, i think we've been preventing gyro from implementing some nice features, such as automatic pupsave creation at bootup (or something), also suggested by mistfire i think.

Which also reminds of me pupsaveconfig, i never used it, but i saw that name in sfs_load, extrasfsfind..... that's a shinobar app, it's a pity ALL of his apps weren't adopted early as core scripts, replacing whatever they were meant to replace... before he added support for puppy4, puppy3, puppy 2, puppy 1, puppy 0 releases, and with heavy and quirky rox-specific stuff, making the apps unreadable and hard to fix without breaking many sfs's already created...

But i still think that, at least for linux partitions, there should a quick option to create a savefolder at... shutdown.. if DEV1FS is a linux partition ... Do you want to save session? Yes (wizard), Yes (automatic).. hmm this extra button looks quite easy to implement actually ...

gyrog commented 6 years ago

@LateAdopter, I just did a test using upupbb and 4 savefolders, "upupbbsave", "upupbbsave_fred", "upupbbsave_mac", and "upupbbsave_mabel". It all worked as expected; "init" asked me to choose one and then used the one I chose.

@wdlkmpx, Most of my "nice features" are implemented in the "overlay_init" project.

Now there is an alternative to cryptoloop, it should be possible to drop support for it. Droping it from shutdownconfig would be a good first step.

The "proper" way to use a stack as / seems to be to pivot_root in "init" and pivot_root in "rc.shutdown", rather than the Puppy way. It would be nice to locate a template of how to do this with an aufs stack. Being able to umount the stack would also open up some possibilities for saving mechanisms, i.e. being able to freely manipulate the rw layer, particularly with overlayfs.

gyrog commented 6 years ago

I ran upupbb with savefile in pupmode=13, got "recovering journal" even after selecting "nosave" on shutdown. I then hacked the "init" to use a "ro" loop device and mount the savefile "ro", (and surpress fsck of loop device). I got a clean shutdown, though of course I couldn't save anything.

This test increases my suspicion that the problem is the presence of the "rw" loop device in the stack, and I can't see how we can have a savefile without an "rw" loop device in the stack.

LateAdopter commented 6 years ago

Hello gyrog I wasn't trying to start a debate on the relative merits, just give a concise indication of my rationale. The simple analogy is that a container file is like a partition except that, you can copy it, move it, resize it, without the likelyhood of corrupting the disk structure, and its contents are not visible in the root filesystem until you mount it, and you don't need a UNIX partition.

Back to the topic My criteria are 1 Does the init complain by writing fsckme.flg to the partition? 2 Does the filesystem driver for the partition complain in the messages file? 3 Does the filesystem driver for the savefile filesystem complain in the messages file? 4 Does Windows shout abuse at you and run chkdsk.

With Pemasu's Precise Puppy 372 and Fatdog64 710 the answer is NO to 1,2,4 but 3 gets a warning "running unchecked filesystem" for ext2 in Precise and ext4 in Fatdog I don't have journaling enabled, so I wouldn't get your error.

With Woof-CE puppies that I have tried: Slacko64 alpha and Tahrpup64 I got yes to all four questions. With Xenialpup64 I get yes to 1. (2 and 4 are N/A on that PC)

4 is the most important because I think it drove away several new puppy users in the Tahrpup thread. a) install puppy to Windows PC b) get abuse from Windows and have to wait while it runs chkdsk on a large partition c) post in the forum d) no response e) delete puppy I would have done same if that had happened when I first tried Lucid puppy 525

I don't know whether these issues have been fixed since I last tried.

As for F2FS, I have read somewhere that it doesn't care, but don't ask me for a reference, I may have dreamt it! F2FS DOESN'T give an error in messages, although init DOES write fsckme.flg with Xenialpup64.

gyrog commented 6 years ago

@LateAdopter, Your item 1. does not represent "init" complaining, it's a flag to detect an abrupt shutdown, e.g. loss of power. "init" writes it, and "rc.shutdown" deletes it. Of course it won't detect a simple failed umount. Your item 4. is the biggest concern, it's why I try to have complete separation of windows and Puppy. Puppies neither reside on, nor use, any of my windows partitions.

LateAdopter commented 6 years ago

I'm no expert and I don't know linux scripting but it doesn't stop me reading through scripts to try and follow what they are doing.

As I observe it, it's not as simple as that. If I run Precise 372 or Fatdog there is no fsckme.flg written to the partition, that I can see. If I run Tahrpup64 without a savefile there is no fsckme.flg left. If I create a savefile then fsckme.flg is left in the partition after the first normal shutdown and the VFAT driver complains in the messages on the next boot.

The issue with Windows is not whether the fsckme.flg is there but something else.... Running Fatdog or Precise does not make Windows complain. I run Fatdog and Windows on this PC every day. Running Tahrpup64 makes Windows complain. If I run Fatdog or Precise after Tahrpup64 and before Windows, it does not clear the problem, and Windows still complains next time I run it.

The only thing I could think of, at the time the problem first appeared with Slacko64 alpha, was that there might be a lock left on the Slacko64 savefile because it was still open with write access at poweroff. I don't know if there is an improper shutdown flag in the filesystem that would not be cleared by running Fatdog/Precise on the same partition.

EDIT Since that was some time ago, and I have not put a Woof-CE puppy on my Windows PC since then, I have just tried Artfulpup 17.10.01.

Booted, then shutdown and created savefile. Windows did not complain Booted with savefile and shutdown Windows complained.

There are no complaints from Windows after I boot Precise Puppy 372 with a savefile.

I did not find any complaint from the VFAT driver in messages, only the usual warning from EXT2 about mounting an unchecked filesystem.

Agreed fsckme.flg was not left on the partition.

gyrog commented 6 years ago

I've had a look at FatDog, and doing a pivot_root in "rc.shutdown" and then dismantling the stack, might be doable, I've established that at least pivot_root is present in xenialpup 7.5, so it may be available in others. Doing so will require significant changes to "init" and "rc.shutdown". And after all that, we still could have problems with something being "busy".

But it might be a while before I get around to giving it a go. So, anyone willing to give it a go soon?

wdlkmpx commented 6 years ago

I agree it's not recommended to boot windows and puppy from the same partition, or use savefiles (or even savefolders) in a windows partition, unless you don't care much about what's in there..

I use savefiles only in usb flash drives (ntfs partitions). Other than that, in ext3/4 partitions, only when i want to test if something is working fine with savefiles.

I had problems using puppy savefiles in vfat partitions in hard drives.. yes in the precise days, when i was new to linux in general. And i had more evident problems in usb drives, that's why i use ntfs partitions.. but that requires the latest ntfs-3g and stuff (not a problem). Some initrd apps should just be in the final rootfs to update some older pups. Maybe as pkgs in common32, common64.

Something must be happening, but it's not obvious here, i see the same errors and the same results in all puppies. I don't use vfat partitions...

I think i'll look into the pivot_root stuff, according the slax page, a ramdisk is created, formatted ext2, all the initramfs is copied to the ram disk, what is exactly done to make things work at bootup and shutdown.. i've no idea...

gyrog commented 6 years ago

In FatDog, they just mount a tmpfs and copy initramfs stuff to there. While the system is running, this tmpfs is mounted "ro". (Hmmm.. that would mean we wouldn't need to copy files to /pup_new/initrd.) During shutdown the tmpfs is remounted to "rw", a cleanup script is copied to it, and a pivot_root is done to it, then the cleanup script dismantles the aufs stack. Note: like Puppy they use a switch_root in init. I'm not sure there's anything magical about retaining the stuff from the originial initrd.gz, or is it just a way of creating another small runnable file system in ram, that's independent of the stack. Maybe we could create the tmpfs during shutdown, and extract a tar file, containing an appropriate set of files to support "cleanup", into it. A bit like initrd.gz contains a set of files to support "init".

wdlkmpx commented 6 years ago

I was trying something like this, following the logic of what it used to be there

mkdir -p /mnt/unrootfs busybox mount -t tmpfs tmpfs /mnt/unrootfs busybox mount -o remount,prepend:/mnt/unrootfs,noxino,ro -t $uniFS / /

but it doesn't work either, but it might fix lateadopter's issue

pivot_root: hmm at least a static busybox is needed, hmm what else, i don't know. But i read that you can't pivot_root into a initramfs? old stuff. Needed stuff is not in the final fs.

I've never used fatdog, but i once extracted its contents to see what's in there. if there's something usable, then a copy & paste operation can saves a lot of time.. i saw it was quite different

I'm inspecting a pkg, fatdog-scripts,, apparently rc.cleanup, that must be somewhere , is the key script, if i create a random tmpfs in init, copy stuff there and try to use it at shutdown maybe it will work?

jamesbond3142 commented 6 years ago

During shutdown the tmpfs is remounted to "rw", a cleanup script is copied to it

That "cleanup script" is rc.cleanup itself. I'm copying it from the top of stack to give it flexibility to allow end-user version of it is being used, instead of the original copy in initramfs. In reality this flexibility has never been used, rc.cleanup was never modified.

Note: like Puppy they use a switch_root in init.

LOL :) We don't have a choice here. Anything that uses initramfs (as opposed to initrd), has to use switch_root.

I'm not sure there's anything magical about retaining the stuff from the originial initrd.gz, or is it just a way of creating another small runnable file system in ram, that's independent of the stack.

You're right about the independence. Nothing magical about the retainer stuff, but if you're going to carry a copy of initramfs for "shutdown purposes", may as well make it useful and tack it to the stack. Most of the binaries in Fatdog initramfs are upx-ed so carrying them in tar.xz won't save much space anyway.

I'm inspecting a pkg, fatdog-scripts,, apparently rc.cleanup, that must be somewhere

It's inside Fatdog initramfs; it's not made available as a package. Get any Fatdog iso and look into its initramfs ("initrd").

if there's something usable, then a copy & paste operation can saves a lot of time

You're welcome. That's the point of open source.

wdlkmpx commented 6 years ago

I actually found rc.cleanup in fatdog-scripts, for UML or something.

It creates a tmpfs, copies a static busybox, creates symlinks. rc.cleanup copies itself to the tmpfs and $BB pivot_root $SHUTDOWN_ROOT $SHUTDOWN_ROOT/$OLDROOT then moves dev, proc and sys to the new root.

As the first experiment, rc.cleanup is exec'd by /sbin/poweroff / reboot (puppy script). But this doesn't seem to work, i edited a bit to make it compatible my running system and static busybox. After the killing of processes (kill -9), something happens, a login is triggered or something, and ends there. mingetty is missing. hard reboot.

I edited stuff and the inittab (to use getty instead of mingetty) and added an autologin script (login -f root), added a few more files to the tmpfs (inittab, shadow, passwd) before pivot_root... recompiled a more complete static busybox (1.28.3). It successfully pivots root and brutally kills processes ... and here is where it's starting again, autologin, but it's supposed to end. Hmm. Ignorance..

I've just downloaded a fatdog iso and am learning how it works

There is one thing that i saw in fatdog that i've done locally tty1::respawn:/sbin/getty -n -l /bin/autologin 38400 tty1

It looks exactly like the change i made to the puppy inittab.

the puppy inittab is different, and the logic in the shutdown process is different. the magic happens here:

Stuff to do before rebooting

::shutdown:/etc/rc.d/rc.shutdown ::shutdown:/etc/rc.d/rc.cleanup

I'll be learning and experimenting.. slowly. Only time will tell if i'll be able to fix the issue..

wdlkmpx commented 6 years ago

Inspecting my savefile it's also evident that i messed up something while trying to get working code to populate the /bin directory in the new root

        $BB cp -a /bin/busybox bin
        for i in $(busybox --list) ; do
            ln -s busybox ./bin/$i
        done

I see a bunch of symlinks in the savefile and one of them is poweroff, which is a puppy script. The next boot will be something that will not work as expected.

well, that script, basically runs as usual and in the final line it execs rc.cleanup, which does work fine until it starts doing something 'weird'. i removed the last 20-30 lines from rc.shutdown. and edited stuff here and there. root2user and user2root now edit /bin/autologin instead of /etc/inittab.

I see the rc.cleanup in the fatdog iso is a bit different from the 'rc.cleanup - modified for Fatdog 710 UML' in the fatdog-scripts, the iso rc.cleanup uses an already created tmpfs or something, that was already there from beginning probably.

This is code not present in the UML rc.cleanup

if [ -z "$IS_REBOOT" ]; then # only if we're powering off
    echo -n Spin down disks ...
    for dev in $devs; do
        echo -n " $dev"
        $SDPARM -C sync /dev/$dev > /dev/null
        $SDPARM -C stop /dev/$dev > /dev/null
    done
    echo
fi

In puppy probably should be:

if [ -z "$IS_REBOOT" ]; then # only if we're powering off
    echo -n Spin down disks ...
    for dev in $devs; do
        echo -n " $dev"
        $BB hdparm -fy /dev/$dev > /dev/null
    done
    echo
fi

I also downloaded a porteus iso, unpacking the initrd i see a 'cleanup' script, with something totally different from what i've seen before, unreadable code that i have to properly format to read it... these lines look interesting

if [ ! "$RE_EXEC_CLEANUP" ]; then
    export RE_EXEC_CLEANUP=1
    pivot_root . union
    exec chroot . /cleanup "$@" <dev/console >dev/console 2>&1
    echo "Something was wrong because we should never be here!"
fi
gyrog commented 6 years ago

During this early testing phase we could focus just on rc.shutdown. Just after the xino fiddle, where it is attempted to remount some things "ro", if the tar file exists: Create a tmpfs, and uncompress the tar file into it. Do a pivot_root to the mount point of this tmpfs.

The tar file probably only needs to contain busybox and a cleanup script that pulls down the stack. (All the other cleanup stuff should have already been done by rc.shutdown, though obvioiusly more coud be included in the clenup script, if required.)

I am concerned that we might have a real "busy" problem, and even after doing a lot of work to get the pivot_root to work, there will still be a "busy" problem that prevents the stack from being pulled down.

gyrog commented 6 years ago

I doubt that spinning down the disks is part of our problem.

gyrog commented 6 years ago

A suggestion for a possible implementation:

In "init", towards the end: mount new tmpfs as /pup_new/initrd/files copy /bin to /pup_new/initrd/files/bin copy /pup_new/etc/rc.d/rc.cleanup /pup_new/initrd/files/cleanup remount /pup_new/initrd/files "ro"

in "rc.shutdown", towards the end: remount /initrd/files "rw" mkdir /initrd/files/pup_old pivot_root to /initrd/files execute cleanup

wdlkmpx commented 6 years ago

I pushed 20 commits to testing.. to address some reported and unreported issues. Hopefully i didn't introduce another significant bug, i spent a day testing..

I hadn't gotten time to experiment a bit more with the shutdown stuff, but i have to say that some of the changes i made might actually improve the situation for @mavrothal and @LateAdopter .. or maybe not.. i always see the same errors.

I removed some stuff from poweroff, rc.sysinit and usablefs that shouldn't be there.. stuff that was added either while testing a full install that won't shutdown, dealing with a misconfigured busybox or trying to boot an old puppy with a wce initrd.gz.. at least 2 of them should not make any difference, not in "normal cases".. but they might cause bugs in some cases, who knows...

I also have to say i prefer fatdog's approach for shutdown, it looks cleaner, so i'll probably try to adapt it to puppy in my next experiments... if it doesn't work then i'll keep everything intact and just copy some code.

I also want full installs to boot with a initramfs. I'm willing to produce an updated version of grub4dosconfig that supports generating a proper menu for full installs + initrd.gz.. a pet pkg that should be posted on the forum...

There are still many things to fix, but hopefully after all these changes, a new xenialpup will be produced to replace the current one @ puppylinux.com ...

I provide this .iso file containing all the recent changes (for testing purposes): http://www.01micko.com/wdlkmpx/stretch-7.0.0b3-uefi.iso http://www.01micko.com/wdlkmpx/stretch-7.0.0b3-uefi.iso.md5.txt

I had issues uploading the file with gftp.. had to continue with filezilla. i'll check later..

Ps: it's a "tainted" iso, it also includes all the changes from my huge pull request (currently open)..

wdlkmpx commented 6 years ago

Testing the iso i posted above, i notice a delay when shutting down, which means that it will probably work for lateadopter in a fat32 partition (using a savefile).. windows will not complain.. why do i say this.. because that's the only difference.. this delay means something.

I also tested the shutdown stuff in a full install + initrd.gz. As time passes by, uefi will be everywhere, so i think grub4dosconfig should be the prefered option to create bootable media in non-uefi installs... at least in woofce, puppyinstaller should rely on grub4dosconfig exclusively... everyone uses grub4dos.. to say otherwise is living in denial..

The shutdown procedure works fine in a full install, and pfix=fsck shows a clean filesystem, forcing an improper shutdown also triggers the fsck in the initrd which works just as expected and the boot sequence continues. i haven't tested without initrd because i have bad memories. But i might try, just to see how broken stuff is without it. Specially the ramdisk stuff in /sbin/initNEW and the 1st shutdown..

Other than that, i did it again, i deleted the savefile containing all my changes and stuff related to my experiments with the shutdown procedure. I have to start over.

wdlkmpx commented 6 years ago

Following gyro's suggestions, i found that exec'ing cleanup in rc.shutdown does not work, i get the same results as in my initial tests.. the system 'starts again' after the script finishes

So i need some stuff from the running system (autologin, inittab, passwd, shadow), after that i get to test some stuff in the new root, i keep running exec /cleanup phase2 and i see it just can unmount stuff.

lsof shows process 1 using "deleted" files (/oldroot/dev/console, /oldroot/tmp/bootsysinit.log) the current busybox process is using 3 ttys. It was clear the approach was wrong.

This is the inittab from fatdog-scripts, it's different from the one in the fatdog iso

::sysinit:/etc/rc.d/rc.sysinit tty1::respawn:getty -n -l /bin/autologin 38400 tty1 tty2::respawn:getty 38400 tty2 tty3::respawn:getty 38400 tty3 ::ctrlaltdel:/sbin/reboot

Stuff to do before rebooting

::shutdown:/etc/rc.d/rc.shutdown ::shutdown:/etc/rc.d/rc.cleanup

for testing clean unmounts (type "kill -3 1" to activate this)

::restart:/bin/sh 2>&1

By following the logic in this inittab and adapting the reboot/poweroff/rc.shutdown scripts, now rc.cleanup works as part of the shutdown sequence, but i see it still doesn't unmount the PDRV. I'll take a look...

wdlkmpx commented 6 years ago

After some more tests and adding some random lines here and there, this is what i plan to push to testing:

www.01micko.com/wdlkmpx/rc_cleanup.sfs

static busybox 1.28.3, inittab, poweroff, rc.shutdown, rc.cleanup.

One thing i noticed is that the pdrv cannot be unmounted (not even in a full install without initrd), but it's remounted read-only, i have a slightly modded version that also creates a tmpfs in a full install, and performs some more cleanups.

It does unmount all the sfs's, but there's some kind of demonic magic that is keeping the pdrv and savefile busy. It takes a more powerful spell to unlock this level. I have tested in only one pc. I can see this having different effects on different hardware, so this is an improvement that goes further, but it's not enough for savefiles.. in this machine.

It's easy to see what's going on. kernel params: debuginitrd to see what's happening in rc.cleanup, debugshutdown = debuginitrd + exit to shell to look around.

I have seen different results, but i wasn't paying much attention, i started seeing more "successful" results when the tmpfs was created in rc.cleanup. But i'm not sure what else i did.

It's easy, add rc_cleanup.sfs as the adrv or ydrv and boot and create savefile or savefoder, always keep pfix=fsck,fsckp and see what happens, if it doesn't cause regressions (worse results), then that's it

gyrog commented 6 years ago

Sorry, I haven't tried it yet. I'm not sure we need to actually umount stuff, just remount it "ro". I don't think we need to worry about the sfs files, these are all "ro", and usually in ram anyway. I think the problem is "rw" stuff that is on disk. If we can get to a situation where all the disk partitions are mounted "ro" and the only "rw" stuff is in ram, then all should be well. So the challange is always going to be the partition that contains the save layer. It would be interesting to setup a test with the savefile on a different partition from the frugal install directory, and don't bother with the sfs files.

There may be another way; don't ever have any stuff that is "rw" in the stack, on disk. This would mean changing Puppy so that the "rw" layer of the stack is always in ram. So it's a bit like a pupmode=13, except that the save layer on disk is really "ro" while ever it is in the stack, so a savefile would be mounted with an "ro" loop device just like sfs files. There is no snapmergepuppy, and at shutdown we either: a) Destroy the stack, umount the savefile, remount it "rw", update it, and umount it again. b) Simply copy the "rw" directory at the top of the stack to an archive on disk, then in init before the save layer is added to the stack, restore the "rw" directory in ram from the archive, mount the savelayer "rw", move files from the "rw" directory in ram to the savelayer, umount the savelayer and then remount it as fully "ro" for the stack. Option b) is also compatible with "overlayfs", and with "overlayfs" it is supposed to work on non Linux filesystems, so maybe the added complication of a savefile would not be needed at all.

wdlkmpx commented 6 years ago

Everything you mention can be implemented as an alternate method (or init).

Triggered by a boot param or build option.. basically everything i added to the puppy init/shutdown has this 'feature'..,. For example, rc.cleanup is completely optional.. if 'umount -ar' is run in rc.shutdown.. rc.cleanup is not called by busybox reboot/poweroff.. or so it seems. rc.cleanup is more a research project, that indeed does more stuff and it's also a debugging tool

rc.shutdown has some logic to determine if it should run 'umount -ar'... only if /initrd/files is mounted or /bin/busybox is a static binary.

Then rc.cleanup also reads /etc/rc.d/PUPSTATE so it also knows what to do if aufs is not the punionfs if [ "$OLDROOT" ] && [ "$PUNIONFS" = "aufs" ]; then ... fi

One reason i usually just keep what's there and try to simplify/fix it (even if this takes years and is pointless due to some very awful design flaws..) is because i don't want to explain stuff, also don't want to deal with the toxic puppy community. Of course this doesn't include the desktop/filemanager where i need at least something usable. Last time i checked the forum i read people were being vocal against savefolders being the default pupsave option..

Following this logic many ideas can be implemented without having to worry about stuff, just focus in your projects and add code to testing without worries, making room for your stuff..

gyrog commented 6 years ago

Hmm..., maybe rc.cleanup needs to move all the mount points in $OLDROOT/initrd, as well as /dev, /proc and /sys.