kisslinux / kiss

KISS Linux - Package Manager
https://kisslinux.github.io
MIT License
464 stars 62 forks source link

Installing dependencies of running software crashes the software #226

Closed jedahan closed 3 years ago

jedahan commented 3 years ago

As an example, if I kiss build cairo && kiss install cairo, sway will hard-lock.

I was able to replace cp -fP with (busybox) install in pkg_install_files on https://github.com/kisslinux/kiss/blob/master/kiss#L1088 which fixed the issue.

Not sure if there is a way to fix not using install, but was nice to be able to kiss update without having to hard reboot my laptop.

dylanaraps commented 3 years ago

I cannot reproduce this issue at all (with any libraries sway links to nor with other packages in a similar scenario).

Now, I had a similar issue a long while back (updating mesa would kill sway) and fixed it by enabling CONFIG_CHECKPOINT_RESTORE=y in my kernel. Not sure if this is related to this present issue, just thought I'd mention it. (NOTE: this is unique to amdgpu).

Whatever the fix ends up being for this issue - it must remain portable and if at all possible POSIX compliant. I'm not opposed to using install, I'd just like to do some research and see what our options are as to other possible remedies.

dylanaraps commented 3 years ago

Let me know if this fixes the issue, if not I'll try other approaches I have in mind.

dylanaraps commented 3 years ago

Somewhat relevant: In my blog post I mentioned replacing components of the package manager with C (in an opt-in, unintrusive manner of course). One of the first items on my list is a pkg_install_files equivalent in C as installation is one of the slowest parts of the package manager (for medium to large packages).

dilyn-corner commented 3 years ago

Small change on my own end with the latest master. Instead of my computer just completely locking up and requiring a forced restart, wayfire merely segfaults when it or its dependencies are installed. So... Progress...

dylanaraps commented 3 years ago

Interesting. I still wonder why I can't reproduce this at all.

Will work on this some more. Thanks for testing.

jedahan commented 3 years ago

Trying to come up with a minimal testcase is hard. Maybe writing a program that calls a library function once every 5 seconds, then changing the name of the function and replacing the library while the program is running? I wonder if the install package (busybox or coreutils) has a test like this.

dylanaraps commented 3 years ago

I've been playing around with using install and it's still not perfect as we have to fallback to cp -f as install aggressively resolves symbolic links. Here's a patch you can try.

diff --git a/kiss b/kiss
index 11c6b83..7bd4f23 100755
--- a/kiss
+++ b/kiss
@@ -1085,27 +1085,11 @@ pkg_install_files() {
                 # Skip directories as they're likely symlinks in this case.
                 # Pure directories in manifests have a suffix of '/'.
                 [ -d "$_file" ] || test "$1" "$_file" || {
-                    _file_tmp=$2/${file#/}
-
-                    # Copy the file to a temporary location so that later
-                    # verification (after pkg_remove_files()) is possible.
-                    #
-                    # This duplication only occurs during the first
-                    # invocation of pkg_install_files() - the second call
-                    # will simply do the 'mv' call.
-                    case $1 in -z)
-                        cp -fP "$2/${file#/}" "$2/__f-$pid"
-                        _file_tmp=$2/__f-$pid
-                    esac
-
-                    # Atomically move the temporary file to the root
-                    # filesystem. The running processes will either get
-                    # the old file or the new one.
-                    mv -f "$_file_tmp" "$_file"
-
-                    # Skip changing permissions of symlinks. This prevents
-                    # errors when the symlink exists prior to the target.
-                    [ -h "$_file" ] || chmod "$b$oct" "$_file"
+                    if [ -h "$2$file" ]; then
+                        cp -fP "$2$file" "$_file"
+                    else
+                        install -m "$b$oct" "$2$file" "$_file"
+                    fi
                 }
         esac
     done || :
dylanaraps commented 3 years ago

Please try this patch. The original fix is wrong as the atomic operation can't occur across filesystem boundaries. This instead copies to the destination directory and moves it directly from there.

diff --git a/kiss b/kiss
index 11c6b83..394f613 100755
--- a/kiss
+++ b/kiss
@@ -1085,7 +1085,8 @@ pkg_install_files() {
                 # Skip directories as they're likely symlinks in this case.
                 # Pure directories in manifests have a suffix of '/'.
                 [ -d "$_file" ] || test "$1" "$_file" || {
-                    _file_tmp=$2/${file#/}
+                    _file_tmp=$2$file
+                    _parent_dest=${_file%/*}

                     # Copy the file to a temporary location so that later
                     # verification (after pkg_remove_files()) is possible.
@@ -1094,8 +1095,8 @@ pkg_install_files() {
                     # invocation of pkg_install_files() - the second call
                     # will simply do the 'mv' call.
                     case $1 in -z)
-                        cp -fP "$2/${file#/}" "$2/__f-$pid"
-                        _file_tmp=$2/__f-$pid
+                        _file_tmp=$_parent_dest/__tmp_${file##*/}-$pid
+                        cp -fP "$2$file" "$_file_tmp"
                     esac

                     # Atomically move the temporary file to the root

NOTE: Regardless of whatever the final fix is, this entire process needs to be made more robust.

jedahan commented 3 years ago

Both solutions worked for me, tested by upgrading sway and it’s dependencies while running. Thank you!

dylanaraps commented 3 years ago

Pushed the fix. Please try and it let me know. :)

dylanaraps commented 3 years ago

If we want to fix the error case (dirty filesystem). What we want is something transactional. This commit just adds atomicity to each file transfer. To make this transactional we'd be pushing the limits of the shell further (and slowing down install even more in the process).

Transactional install would require making copies of all the previous version's files in $KISS_ROOT (this can happen before or during each transfer), doing the install and then if there are errors, copying all the previous version's files back to the filesystem (while also deleting any files installed from the new version).

This is better suited for the C implementation of the component although it should be trivial to print out the state of the filesystem alongside the error message for the shell implementation.

jedahan commented 3 years ago

I was playing around with writing a similar idea out

pre-merge: tar up backup manifest (+ checksums?) merge: as is trap err in merge: untar backup manifest (+ report checksums)

dylanaraps commented 3 years ago

Can I get confirmation that master still works? I had to fix #232 and this may have regressed the fix for this issue.

dilyn-corner commented 3 years ago

Latest git master appears to have resolved this problem!

jedahan commented 3 years ago

Tested latest master by removing many firefox dependencies while writing this comment. Seems solid.