procount / pinn

An enhanced Operating System installer for the Raspberry Pi
1.11k stars 123 forks source link

Loss of read/write activity for mounted devices within PINN #447

Open annaclets opened 3 years ago

annaclets commented 3 years ago

When a PINN-created backup is used to install/restore an OS, e.g. RaspiOS, it loses the ability to read any mounted device -i.e. USB flash drives and other partitions in the same SD card. likewise, Twister OS, when installed by itself in an SD card, runs perfectly well. But when installed in PINN, Twister OS can no longer read any mounted device.

To be fair to PINN, which in my opinion is currently the best multiboot tool for RPi, this problem also occurs in BerryBoot - Twister OS does not have i/o on any mounted device. What gives?

lurch commented 3 years ago

Weird... I wonder if some file / directory is getting created with the wrong permissions? Can you still see the files on the mounted device, just can't open any of them for reading?

annaclets commented 3 years ago

Complete blank - no file visible. The devices appear to be dismountable / mountable, though, but no files appear.

procount commented 3 years ago

Not seen that before. I shall have to experiment with it

lurch commented 3 years ago

@annaclets Have you done any customisation of RaspiOS (e.g. adding new users), or is it still a vanilla install?

annaclets commented 3 years ago

RaspiOS was customized, which is why it was backed up so as not to lose the customization. When restored, it lost i/o on mounted devices. Same result when following the instructions from this: https://github.com/procount/pinn/wiki/How-to-Create-a-Multi-Boot-SD-card-out-of-2-existing-OSes-using-PINN

TwisterOS, however, is vanilla install and never had i/o on mounted devices in both PINN and BerryBoot.

lurch commented 3 years ago

I'm sorry for the stupid question, but are you sure there's actually still files on the drive, and the Pi isn't just (correctly) showing an empty drive? :man_shrugging:

annaclets commented 3 years ago

I have found the solution, but it leads to some complications. Simply put, it turns out that the tar archiving system does not produce a fully accurate image of an ext4 rootfs partition. I switched from rootfs.tar.xz to rootfs.img.xz and the problem with mounted devices got solved for both TwisterOS and backup RaspiOS.

The complication is it's a tedious process involving shrinking and expanding partition sizes, as stated in the abovementioned wiki.

procount commented 3 years ago

Yes I think you are right. The archiving looks to be a bit lacking. I shall look at improving it. Thanks for reporting it.

procount commented 3 years ago

I hope to have an update to fix this shortly. Would you be willing to test it out?

lurch commented 3 years ago

@procount I'm intrigued to hear what the actual underlying problem was, that caused such seemingly bizarre behaviour?

annaclets commented 3 years ago

@procount, yes, I'm looking forward to testing the fix.

procount commented 3 years ago

Please download https://sourceforge.net/projects/pinn/files/testing/pinn-353a.zip/download You should unzip it over an existing PINN installation (one that doesn't matter, since this is beta software!) (password is backup) It does not include recovery.cmdline or config.txt, so if you don't have an existing installation to test, just copy those 2 files across.

You will need to install a new OS (Twister/RasPiOs etc) to ensure the file attributes are all correct, then back them up and restore them.

(I have not had time to even run this to see if this boots, so use with caution! - but it should be ok)

annaclets commented 3 years ago

I have, so far, tried to back up ubuntu, lineage17, raspiOS and recalbox. Success with ubuntu and lineage17, but cannot proceed with raspiOS and recalbox because of detected incompatibity (with tar?). I noticed that you have now shifted to xxx.img.gz, i.e. raw format. Shouldn't you lift the restrictions on raspiOS (unconventional ext4 format?) and recalbox (exfat?) now that youi're into raw imaging?

Initial restore ran into errors. The partition.json still specified for the img partitions' filesystem_type as "ext4" rather then "raw". Upon correction, restore proceeded.

The lineage17 restored successfully. However, the ubuntu restore did not complete due to an error in the partition_setup.sh, which I checked and confirmed is identical to the old one.

Tried booting lineage17, won't boot.

procount commented 3 years ago

Backup should use the same method of archiving as was used to create the OS install files originally, with some minor exceptions.

Images were used for some partitions in lineage because it was the only way to convert them and they are fixed size partitions. It is much more complicated to restore an image to a different sized partition, which is normally required.

I shall investigate the problem OSes.

annaclets commented 3 years ago

Oops, I'm sensing a misunderstanding here. I now understand that you have not actually switched to raw imaging? The install files I used were mine, which have raw images.

Ok, I will now try again with original base installations which used tarballs.

annaclets commented 3 years ago

At this point, a backup trial of TwisterOS is moot because it never had i/o on mounted devices due to the original install files being tar based.

Just tried raspiOS, ubuntu and retropie, all from original base installations. Oddly, all that was generated from raspiOS and ubuntu were 1KB xxx.tar.gz, so I would consider them failures. Backup of retropie could not proceed because of incompatibility.

procount commented 3 years ago

I'm afraid pinn 353a is not going to work. See https://github.com/raspberrypi/noobs/issues/500.

Currently v3.5.2 uses plain tar, but I've verified it has started to cause this problem. I don't remember it being an issue before. Perhaps its due to a bump in the file system versions that tar does not understand?

It looks like the file manager gets half way through mounting a USB drive, because the drive labels are displayed in the folder list, but they are not really mounted. Clicking on the drive label produces an error about not being able to access the drive via its UUID.

@Lurch - another opportunity for you to try PINN perhaps? I'm a bit stumped Might have to revert to backing up to an image!

lurch commented 3 years ago

I've done some investigation this afternoon, and it looks like it's an ACL issue. The USB disk does get mounted (which is why you can see it in the file explorer), but the missing ACL means that only the root user is able to see the contents of the disk! @annaclets The way to fix this is sudo setfacl -m user:pi:r-x /media/pi

I don't remember it being an issue before. Perhaps its due to a bump in the file system versions that tar does not understand?

It seems to me that it's entirely possible that you just never tried mounting a USB disk in a running OS that you'd restored from a PINN-created backup before? :shrug:

@procount WRT the bsdtar problem in https://github.com/raspberrypi/noobs/issues/500 it looks like libarchive has a HAVE_STATFS option, so maybe forcing that to 0 would fix the statfs failed error? In the meantime / as an alternative: a temporary fix is to not add /media/pi into the rootfs backup tarball (or delete it from disk after restoring the backup), and Raspberry Pi OS will then create it automatically (when needed) with the correct ACL.

procount commented 3 years ago

Cheers! TBH, I can't remember specifically mounting a USB drive. It has not been part of my usual test suite, but perhaps it should be now... I'll give the HAVE_STATFS a try and see if it works. The 2nd temporary fix should allow previous backups to start working again if they are re-installed.

lurch commented 3 years ago

The 2nd temporary fix should allow previous backups to start working again if they are re-installed.

Good point, I hadn't considered that :slightly_smiling_face: Also means that a fresh TwisterOS PINN-install would work for @annaclets , once you've implemented it.

procount commented 3 years ago

Although using your setfacl fix would be a lot quicker!

procount commented 3 years ago

This can be fixed in partition_setup.sh by adding: rm -rf /tmp/2/media/pi

I wonder if we should use rm -rf /tmp/2/media/* instead, so that the username would not matter...? Or could that erase something important?

lurch commented 3 years ago

This can be fixed in partition_setup.sh by adding: rm -rf /tmp/2/media/pi

Probably better to do rmdir /tmp/2/media/pi so that it won't nuke the directory if it's non-empty?

Or could that erase something important?

Possibly... you could guard against that by only deleting empty directories :wink:

for d in /tmp/2/media/*; do
    if [[ -d "$d" ]] && [[ -z "$(ls -A "$d")" ]]; then
        echo "Deleting \"$d\""
        rmdir "$d"
    fi
done

(based on code at https://superuser.com/questions/352289/bash-scripting-test-for-empty-directory )

Although I guess this "solution" only works for OSes where you're able to modify partition_setup.sh and wouldn't work for e.g. backups of Raspberry Pi OS?

annaclets commented 3 years ago

@annaclets The way to fix this is sudo setfacl -m user:pi:r-x /media/pi

Yes, @lurch, I confirm that this indeed fixes the problem, tested with TwisterOS. Thanks!

Now I'm looking forward to testing pinn v3.5.3b, which I expect would integrate the fix in its generated backups.

procount commented 3 years ago

I need to think about how best to implement this. Is it OS dependent? I don't think every OS will have a media/pi folder. I'm sure some won't. Building it into PINN will hardcode it too much I think, causing a maintenance headache. I think adding it to partition_setup.sh is better, but as @lurch says, this would be difficult to add into OSes that I have no/little control over such as Raspios, unless XECDesign will accommodate us. But there are also other OSes.

But really the above is just a sticking plaster over the fact that the tar I use cannot save ACL, and I'm not able to use BSDTAR because of the statfs issue (I tried compiling with HAVE_STATFS=0 and got a kernel panic 😞)

lurch commented 3 years ago

(I tried compiling with HAVE_STATFS=0 and got a kernel panic :disappointed: )

Oh, that's annoying :confused: I wonder if pulling in a newer version from upstream might help? :shrug: https://git.busybox.net/buildroot/tree/package/libarchive

https://www.libarchive.de/ has a newer version than even the latest version of buildroot includes!

I don't think every OS will have a media/pi folder. I'm sure some won't.

As I pointed out above, deleting any empty directories in /media/ might be a broader catch-all? (e.g. would account for different usernames)

procount commented 3 years ago

As I pointed out above, deleting any empty directories in /media/ might be a broader catch-all?

Sorry, I had already taken your point, but my typing was a bit too specific and didn't know when to stop. 😄 Is '/media' always included, or might it be missing e.g. in Android, or some small music box buildroot distro for example? (I have too many OSes to check). Perhaps there are other OSes that use ACL on other folders/files that we are just not aware of. How should I cater for those?

Getting a BSDTAR version that supports ACL without STATFS would be ideal, so I may try those later versions. BUT I recall some conversation with MAXNET about BSDTAR/libarchive and I think later versions omitted something else (xattr maybe - I think it was to do with xbian's btrfs or similar), hence why I use 3.3.1. Got to be careful we don't break something else by fixing this issue.

lurch commented 3 years ago

Is '/media' always included, or might it be missing

Well, obviously you'd check for the existence of the /media directory first! :laughing:

if [[ -d /tmp/2/media ]]; then
    for d in /tmp/2/media/*; do
        if [[ -d "$d" ]] && [[ -z "$(ls -A "$d")" ]]; then
            echo "Deleting \"$d\""
            rmdir "$d"
        fi
    done
fi

Perhaps there are other OSes that use ACL on other folders/files that we are just not aware of. How should I cater for those?

Indeed, as you noted this is just a sticking plaster - the proper fix is to backup and restore ACLs too...

Getting a BSDTAR version that supports ACL without STATFS would be ideal

...agreed!

BUT I recall some conversation with MAXNET about BSDTAR/libarchive and I think later versions omitted something else

If you can find up the details I'd be happy to do a bit of investigation - seems like it'd be weird for bsdtar to actually be dropping features?

Got to be careful we don't break something else by fixing this issue.

Indeed, that's the trouble with supporting so many different OSes :stuck_out_tongue_winking_eye: :rofl:

procount commented 3 years ago

Thinking of adding the above code (copied here):

if [[ -d /tmp/2/media ]]; then
    for d in /tmp/2/media/*; do
        if [[ -d "$d" ]] && [[ -z "$(ls -A "$d")" ]]; then
            echo "Deleting \"$d\""
            rmdir "$d"
        fi
    done
fi

as a permanent script to be executed on OS restoration, separate from partition_setup.sh

lurch commented 3 years ago

Perhaps it's worth doing that for each of the partitions you restore (in case some OS uses a weird layout), rather than only for the 2nd partition? Although maybe it's also worth adding an override flag for that (to prevent this empty-dir-deletion) to the os.json files, just in case there's some OS where it causes problems? (always best to be prepared for the worst :laughing: )

procount commented 3 years ago

Yes, I've been trying to think how best to do this. Maybe it needs a flag option to enable it in the first place, as it is destructive, rather than disabling it if it causes problems, which may be too late. Maybe I should add options to my "Fixup" menu so the user can run the script manually first, but that's not so convenient. Thinking of something similar for #442 as well.

lurch commented 3 years ago

Yes, it's "destructive", but all it's doing is deleting some empty directories (i.e. no actual data loss), so it's fairly easy to 'repair' by just recreating the empty directories :slightly_smiling_face:

Also worth noting that in the majority (?) of cases where the user hasn't plugged in any USB storage devices (within RaspiOS) before making a backup (with PINN) then the /media directory will remain empty, and this becomes a non-issue.

jharris1993 commented 3 years ago

Weird... I wonder if some file / directory is getting created with the wrong permissions? Can you still see the files on the mounted device, just can't open any of them for reading?

I have also encountered this issue experimenting with Raspberry Pi O/S 64 bit version 8/20/2020.

I see the exact same behavior and investigation showed that /media/pi is being created with 744 permissions, (rwx r-- r--). Normally the user directory is created with 755 permissions. (Maybe a umask issue?). Changing the permissions on ./pi to 755 allows the directory to be entered.

Likewise, the permissions on the individually mounted partitions may need to be change.

Additionally, you may need to either close and restart all file manager windows and/or reboot to make the contents of these partitions visible.

procount commented 3 years ago

Should be fixed in v3.5.5

annaclets commented 3 years ago

FYI, I tried both a base install following @procount's updated wiki and a PINN-backup install of Twister OS and I confirm that PINN v3.5.5 has indeed solved this issue. Thanks!

jharris1993 commented 3 years ago

Given that I have an older version of PINN installed, a library of backups and installer images, and I want to take advantage of this new version:

Note that this may not be a complete solution as I have experienced issues with various versions of Raspbian where a restored image had multiple issues, including an inability to save preferences across a reboot after restoring a backup image. The original PINN install worked, but the backup failed in subtle and bizarre ways. Likewise, a plain-vanilla install to separate (non PINN), media always works.

Will this fixup solve the ACL/TAR issue in general, or is this a fix specific to the /media directory? (It looks like it's specific to the /media directory to me.)

I am going to humbly request that this issue remain open until a more generalized fix is found.

procount commented 3 years ago

Re-opening - I had hoped that the simple change of attributes would fix this, but it seems there are some exceptions, so I'll have to implement the setfacl fix too. I have restricted this to the /media folder as this was the only reported problem area.

1, If you boot a PINN formatted drive and enter the recovery menu when connected to the internet, it will automatically detect the new version and prompt to self-update.

  1. I don't know how you store your base image. If you can boot it, it will self-update. Or just replace with the latest version.
  2. The fix is applied after a backup is restored so depending on which OSes they are , it may or may not work.
jharris1993 commented 3 years ago

Somehow or other, "we" need to come up with a more generalized fix since the ACL issue can affect directories and files other than just the /media directory.

Does "image" create a bit-equivalent image that accurately maintains all the attributes and characteristics of the imaged filesystem?

procount commented 3 years ago

Generally, yes it should. Better than tar. But the downside is that it keeps the fixed image size too which would require shrinking and compressing to fit different partition sizes. tar files are much easier in that respect. Have you found an example where files outside of /media are affected?

jharris1993 commented 3 years ago

This was a while back working with the Raspbian 64 bit image, and then working with Dexter Industries/Modular Robotics "Raspbian for Robots. Unfortunately I have been too ill recently to re-examine this.

Especially in the case of the 64 bit image, I posted about it on the Raspbian 64 bit sticky, and was immediately assailed by a whole host of others who had never experienced that problem. They suggested that the problem might be:

Now perhaps the correct answer is (c) All of the above?

I agree, shrinking and re-expanding image files is a ROYAL pain up the tush, and I would not recommend it as a sane alternative.

My concern is that the present solution assumes a relatively limited choice of operating system, all of which conform to certain very specific rules - and that's not an assumption I'd be willing to make. The two of us have already hit edge cases that you thought, (assumed), were untouchable.

Neither you nor I can imagine every possible use case or edge condition, so (IMHO), the fix should be as expansive as possible maintaining the correct attributes and characteristics for every file and directory in the backup image.

One other thought just occurred to me: Does PINN strip sockets from the created image before saving it? Isn't there some bizarre issue that happens if sockets are not stripped?

procount commented 3 years ago

maintaining the correct attributes and characteristics for every file and directory in the backup image.

Well, that's exactly what tar should be doing, but isn't in this particular edge case. But the /media folder is the only report I've had of it not working totally so unless I get news to the contrary I'm going to assume most other OSes are ok. Saving/restoring the attributes, ownership and ACL for every file seems like a lot of unnecessary work.

Yes, 3.5.5 strips sockets now, see #442. Another tar limitation.

procount commented 3 years ago

Enhancement: Leaving open in case there is a future way to save/restore ACL info.

jharris1993 commented 2 years ago

maintaining the correct attributes and characteristics for every file and directory in the backup image.

Well, that's exactly what tar should be doing, but isn't in this particular edge case. But the /media folder is the only report I've had of it not working totally so unless I get news to the contrary I'm going to assume most other OSes are ok. Saving/restoring the attributes, ownership and ACL for every file seems like a lot of unnecessary work. #

My lack of Unix/Linux knowledge is showing here.

My understanding was that a "tar" backup (of whatever flavor) should create, and restore, what is essentially a bit-equivalent image of the files copied. Especially in a Unix-like system, saving files without attributes and/or permissions seems both an exercise in futility and a disaster waiting to happen. (Unless that is requested by a flag or option.)

I've been away for a while, (both wretchedly ill and unbelievably busy), but I would like to re-join this effort and help find a solution.

Any new developments?

procount commented 2 years ago

AIUI, ACL is an enhanced set of attributes over and above the usual owner, group & respective rwx attributes and tar is not aware of them. bsdtar should be able to manage these, but unfortunately I cannot use the bsdtar archiver in buildroot as it uses the uClibc libraries that do not seem to support the dependencies of bsdtar (specifically statfs).

I have recently come across a possible solution which involves the following small scripts:

# to backup:
getfacl -s -R . >permissions.facl

# To restore permissions:
setfacl --restore=permissions.facl

I was thinking of creating the permissions.facl file in the OS partition before tarring it to save an extra file that PINN has to manager per OS partition.

@lurch - any comments?

jharris1993 commented 2 years ago

If possible, (and assuming that this is a good fix), can it be set to:

If you do decide to do this, where would the file be stored? In the tarball for the OS or as a separate file?

Perhaps it might be worthwhile for me to capture the state of the system prior to a backup and restore it after a restore, just to see what happens? Can this be done from the PINN command shell?

One last thought.

The file would have to be destroyed and re-created every backup, just in case it already exists from a previous backup attempt.

jharris1993 commented 2 years ago

Is there a way I can apply this to my own situation?

I would like to try some additional experiments that absolutely require PINN, and I would like to incorporate this idea.

How would I modify the backup/restore scripts to include this?

I am assuming that these commands can be run from within the PINN command shell, right?

procount commented 2 years ago

Yes. There isn't really a backup/restore script within pinn, but there is a process. So before backing up, just run the getfacl command on the appropriate partitions to create the file. Then backup.

After restoration, run the setfacl command. You can run these from pinn after mounting the appropriate partitions, or probably run them from within the os itself. Not sure if you need to use sudo if the latter.

procount commented 2 years ago

PINN v3.7.2 is now released that includes backup/restore of the additional ACL permissions. I also included the additional command in the wiki tutorial on how to create your own custom OSes.

Lightwel commented 1 year ago

Hello - hope you can help with a question on the wiki : at

https://github.com/procount/pinn/wiki/How-to-Create-a-Multi-Boot-SD-card-out-of-2-existing-OSes-using-PINN

which includes at step 2 Mount uSD card [etc]

$ sudo mkdir /media//boot
$ sudo mkdir /media//root
$ sudo mount /dev/sdb1 /media//boot
$ sudo mount /dev/sdb2 /media//root

I get the following:

~ sudo mkdir /media//richardh/boot  ✔ [sudo] password for richardh: mkdir: cannot create directory ‘/media/richardh/boot’: No such file or directory

This on a Manjaro installation, not that that seems relevant.

I can create a different directory ( ~/PIROOT), mount the Root partition to it, and then bsdtar the files to /os//root.tar, but if I do that the command

sudo getfacl -s -R . >acl_permissions.pinn

returns

   ~/PIROOT  sudo getfacl -s -R . >acl_permissions.pinn  zsh: permission denied: acl_permissions.pinn

It was on looking for any information on that error that I found this post.

I now understand the need to create /media/pi. I don't understand the reason for it not working.

Any help please?

Tks R

Lightwel commented 1 year ago

OK this was Manjaro related - I can create /run/media/richardh/root, but I get the original error again

   run/media/richardh/root $ sudo getfacl -s -R . >acl_permissions.pinn  zsh: permission denied: acl_permissions.pinn

If this depends on the OS, I can use RaspberryOS64 on the pi.

Is that the preferred route?

Tks R