topjohnwu / Magisk

The Magic Mask for Android
GNU General Public License v3.0
46.95k stars 11.86k forks source link

Magisk installer.sh doesn't check for full partitions #2461

Closed enetec1 closed 4 years ago

enetec1 commented 4 years ago

I've reported this to EdXposed GitHub:

Starting with v.0.4.6.x EdXposed zip needs to add a file called sepolicy.rule in /persist (/persist/magisk/riru_edxposed in my case).

But there is no check if this operation goes well or not and no error is reported to user if not (e.g. if /persist is disk full since of TWRP logs as it happened to me...).

User could think all has been installed proper way (since of no error reported), but without sepolicy set, phone will bootloop at reboot.

Since of no error reported is very difficult for the user to find for what reason it doesn't work fine too...

and I received this reply...

I use Magisk's original installer.sh, the code for copying the sepolocy file is in this script, u should report it to @topjohnwu in https://github.com/topjohnwu/Magisk/issues

...so, here you are.;)

I think this lack in installer.sh could generate A LOT of unpredictable behaviours with Magisk modules installation...

skittles9823 commented 4 years ago

sepolicy.rule isn't copied directly to /persist/magisk. It's copied to /sbin/.magisk/mirror/persist/magisk.

Magisk also tends to use some hacks when it comes to it's mounting of mirror dirs so I'm pretty certain it should always be able to create files in its mirror and overlay them just fine. I don't know for sure but @topjohnwu should be able to give a better explanation.

The main issue with sepolicy.rule at the moment is that the sepolicy files don't get removed if the module is uninstalled in any other way besides with magisk manager or creating a remove file in the module dir however I've brought that issue up with John and a fix will be added if it's not already been fixed in source (I'll check after posting this).

kam821 commented 4 years ago

@enetec1 @skittles9823 1) Everything, including SELinux policies path depends on system properties like Android version, A/B partitions, System as Root. 2) It's not easy to tell user if sepolicy.rule will work correctly. Module installation itself is not very important, because most stuff happens after rebooting, the sepolicy.rule file is also then copied to the persist directory and removed, e.g. after disable module. 3) I don't think module that requires policies and unloaded sepolicy.rule could cause bootloop, ofc if module author didn't decide to use old live policy loading technique as backup method.

skittles9823 commented 4 years ago

@kam821 if you look here and here you can see magisk always uses the same path to place the sepolicy.rule file.

You can also see the code which copies sepolicy.rule to the mirror dir every boot here

enetec1 commented 4 years ago

Yes, I know there is a mechanism which should copy sepolicy rules file at every boot (or @ first reboot... I'm not sure...) since it's from its log in Magisk that at first I found the disk full error (reported as not capable to create required dir at all...).

BUT when I solved the disk (/persist...) full issue on my device, I saw the file appeared in right place before rebooting, just after installing module in Magisk, so there should be a way to check if it goes well or not during install phase...

About the "old policy" backup, author of the module wrote to me this...

Btw, to avoid bootloop, u can delete '/data/adb/modules(_update)/riru_edxposed(_sandhook)/sepolicy.rule', EdXposed will execute sepolicy script when rule file does not exist in boot.

...and yes, this would avoid bootloops probably, BUT user have to do it manually, because if the module found this, it doesn't switch back to old live policy loading method at all... (and bootloops WILL happen, at least on my Pie, A/B, System as root device... Moto Z2 Force to be sharp).

In fact the only way I've found at first to exit bootloops was to reflash version under v.0.4.6.x of EdXposed that were able to work since of the use of old sepolicy script method that was able to work fine even if /persist was full....

(The FACTS that /persist was full since of not trimmed TWRP logs and because of Moto Flashall doesn't reset it properly as expected has been reported elsewhere by me too...).

kam821 commented 4 years ago

The clue is - Module developer must be sure which phones can support live patching policies without causing bootloop. One method is to detect the Magisk version and the combination of AB_UPDATE, SAR and Android version, just like Taichi does. If live patching could causing bootloop - just disable enforcing completely or simply dont do anything :) Ofc there are some exceptions and this method wont always work. This is a fragment of my Taichi fork. I rewritten it for readability, but the way it works doesn't change. SEPOLICY_FILE variable is "$MODDIR/sepolicy.file"

post-fs-data.sh:

#!/system/bin/sh
# Do NOT assume where your module will be located.
# ALWAYS use $MODDIR if you need to know where this script
# and module is placed.
# This will make sure your module will still work
# if Magisk change its mount point in the future
MODDIR=${0%/*}

# This script will be executed in post-fs-data mode

# ...

# Load utility functions
[ -f "/data/adb/magisk/util_functions.sh" ] && . /data/adb/magisk/util_functions.sh

# Detect if sepolicy file exists & create variable
[ ! -f "$SEPOLICY_FILE" ] && DISABLE_ENFORCE=true || DISABLE_ENFORCE=false

# SELinux boot policy loading is supported and sepolicy.rule exists, exiting.
[ "$MAGISK_VER_CODE" -ge 20110 ] && ! $DISABLE_ENFORCE && exit 0

# Check is disabling SELinux enforcing necessary.
if [ "$(getprop ro.build.version.sdk)" -ge 29 ]; then
  [ "$(getprop ro.build.ab_update)" = "true" ] && AB_UPDATE=true || AB_UPDATE=false
  grep ' / ' /proc/mounts | grep -qv 'rootfs' && SAR=true || SAR=false
  ! $AB_UPDATE || { $AB_UPDATE && ! $SAR; } && DISABLE_ENFORCE=true
fi

if $DISABLE_ENFORCE; then
  # ...
  setenforce 0
else
  grep -v '^#' < "$SEPOLICY_FILE" | while read RULE; do
    magiskpolicy --live "$RULE" 2>/dev/null
  done
fi
enetec1 commented 4 years ago

@kam821 I agree with you... Module developers should be sure critical parts (read: ones which could lead to bootloops at least...) would go inplace correctly during setup.

BUT if a developer says "I use standard Magisk installer.sh" (and obviously he couldn't be the only one...), it's in my opinion an "ambient precheck" included in installer.sh could be useful in a LOT of situation and reduce user risks...

I'm not a developer but I'm not a rookie too, I have some programming skills and large experience on computer systems BUT I can say to you that it takes some days to me to understand why two identical systems, programmed same way with identical files, one worked fine and the other bootlooped...

Let's imagine a less trained user....

MlgmXyysd commented 4 years ago

The clue is - Module developer must be sure which phones can support live patching policies without causing bootloop. One method is to detect the Magisk version and the combination of AB_UPDATE, SAR and Android version, just like Taichi does. If live patching could causing bootloop - just disable enforcing completely or simply dont do anything :) Ofc there are some exceptions and this method wont always work. This is a fragment of my Taichi fork. I rewritten it for readability, but the way it works doesn't change. SEPOLICY_FILE variable is "$MODDIR/sepolicy.file"

post-fs-data.sh:

#!/system/bin/sh
# Do NOT assume where your module will be located.
# ALWAYS use $MODDIR if you need to know where this script
# and module is placed.
# This will make sure your module will still work
# if Magisk change its mount point in the future
MODDIR=${0%/*}

# This script will be executed in post-fs-data mode

# ...

# Load utility functions
[ -f "/data/adb/magisk/util_functions.sh" ] && . /data/adb/magisk/util_functions.sh

# Detect if sepolicy file exists & create variable
[ ! -f "$SEPOLICY_FILE" ] && DISABLE_ENFORCE=true || DISABLE_ENFORCE=false

# SELinux boot policy loading is supported and sepolicy.rule exists, exiting.
[ "$MAGISK_VER_CODE" -ge 20110 ] && ! $DISABLE_ENFORCE && exit 0

# Check is disabling SELinux enforcing necessary.
if [ "$(getprop ro.build.version.sdk)" -ge 29 ]; then
  [ "$(getprop ro.build.ab_update)" = "true" ] && AB_UPDATE=true || AB_UPDATE=false
  grep ' / ' /proc/mounts | grep -qv 'rootfs' && SAR=true || SAR=false
  ! $AB_UPDATE || { $AB_UPDATE && ! $SAR; } && DISABLE_ENFORCE=true
fi

if $DISABLE_ENFORCE; then
  # ...
  setenforce 0
else
  grep -v '^#' < "$SEPOLICY_FILE" | while read RULE; do
    magiskpolicy --live "$RULE" 2>/dev/null
  done
fi

I don't think this is a good solution. Setting SELinux to premissive may cause serious security problems (even temporary settings). SELinux should not be turned premissive at any time in the production environment. So I hate this closed source privacy tracker(TaiChi).

In addition, this combination of SAR / Android / AB may not work on some devices or ROM (for example, one of my devices), resulting in bootloop.

I think this hidden danger of bootloop should be fixed by the installer provided by magisk or magisk, not the module author, because the module author is unlikely (but should try his best) to collect different situations of all devices.

Btw, the script you gave didn't solve the problem of the subject (full partitions).

MlgmXyysd commented 4 years ago

@kam821 I agree with you... Module developers should be sure critical parts (read: ones which could lead to bootloops at least...) would go inplace correctly during setup.

BUT if a developer says "I use standard Magisk installer.sh" (and obviously he couldn't be the only one...), it's in my opinion an "ambient precheck" included in installer.sh could be useful in a LOT of situation and reduce user risks...

I'm not a developer but I'm not a rookie too, I have some programming skills and large experience on computer systems BUT I can say to you that it takes some days to me to understand why two identical systems, programmed same way with identical files, one worked fine and the other bootlooped...

Let's imagine a less trained user....

It should be noted that the standard Magisk installer.sh must be used unless you do not want to publish it to the official repo. Because when using the Magisk Manager to download the module, it will be forced to replace it with the standard Magisk installer.sh.

https://topjohnwu.github.io/Magisk/guides.html#notes

When your module is downloaded with Magisk Manager, update-binary will be forcefully replaced with the latest module_installer.sh to ensure all installer uses up-to-date scripts. DO NOT try to add any custom logic in update-binary as it is pointless.

kam821 commented 4 years ago

It is also possible to add checking result of copying sepolicy.rule in Magisk module installer script and print it. From:

# Copy over custom sepolicy rules
if [ -f $MODPATH/sepolicy.rule -a -e $PERSISTDIR ]; then
  ui_print "- Installing custom sepolicy patch"
  PERSISTMOD=$PERSISTDIR/magisk/$MODID
  mkdir -p $PERSISTMOD
  cp -af $MODPATH/sepolicy.rule $PERSISTMOD/sepolicy.rule
fi

to:

# Copy over custom sepolicy rules
if [ -f $MODPATH/sepolicy.rule -a -e $PERSISTDIR ]; then
  ui_print "- Installing custom sepolicy patch"
  PERSISTMOD=$PERSISTDIR/magisk/$MODID
  mkdir -p $PERSISTMOD
  cp -af $MODPATH/sepolicy.rule $PERSISTMOD/sepolicy.rule
  [ -f "$PERSISTMOD/sepolicy.rule" ] && ui_print " - Custom sepolicy patch has been installed" || ui_print " - Custom sepolicy patch installing failed"
fi

or

# Copy over custom sepolicy rules
if [ -f $MODPATH/sepolicy.rule -a -e $PERSISTDIR ]; then
  ui_print "- Installing custom sepolicy patch"
  PERSISTMOD=$PERSISTDIR/magisk/$MODID
  mkdir -p $PERSISTMOD
  if cp -af $MODPATH/sepolicy.rule $PERSISTMOD/sepolicy.rule 2>/dev/null; then
    ui_print " - Custom sepolicy patch has been installed"
  else
    ui_print " - Custom sepolicy patch installing failed"
  fi
fi

Im not sure that checking cp result directly is the best method, so the first one would be more safety.

But the problem is - Magisk module installer copying sepolicy.rule at the end of installing process, after customize.sh/install.sh executing, so its not possible to check it in current state. Its would be necessary to move this process earlier, before executing customize.sh/install.sh

MlgmXyysd commented 4 years ago

It is also possible to add checking result of copying sepolicy.rule in Magisk module installer script and print it. From:

# Copy over custom sepolicy rules
if [ -f $MODPATH/sepolicy.rule -a -e $PERSISTDIR ]; then
  ui_print "- Installing custom sepolicy patch"
  PERSISTMOD=$PERSISTDIR/magisk/$MODID
  mkdir -p $PERSISTMOD
  cp -af $MODPATH/sepolicy.rule $PERSISTMOD/sepolicy.rule
fi

to:

# Copy over custom sepolicy rules
if [ -f $MODPATH/sepolicy.rule -a -e $PERSISTDIR ]; then
  ui_print "- Installing custom sepolicy patch"
  PERSISTMOD=$PERSISTDIR/magisk/$MODID
  mkdir -p $PERSISTMOD
  cp -af $MODPATH/sepolicy.rule $PERSISTMOD/sepolicy.rule
  [ -f "$PERSISTMOD/sepolicy.rule" ] && ui_print " - Custom sepolicy patch has been installed" || ui_print " - Custom sepolicy patch installing failed"**
fi

or

# Copy over custom sepolicy rules
if [ -f $MODPATH/sepolicy.rule -a -e $PERSISTDIR ]; then
  ui_print "- Installing custom sepolicy patch"
  PERSISTMOD=$PERSISTDIR/magisk/$MODID
  mkdir -p $PERSISTMOD
  if cp -af $MODPATH/sepolicy.rule $PERSISTMOD/sepolicy.rule 2>/dev/null; fi
    ui_print " - Custom sepolicy patch has been installed"
  else
    ui_print " - Custom sepolicy patch installing failed"
  fi
fi

Im not sure that checking cp result directly is the best method, so the first one would be more safety.

I asked @topjohnwu and he said that sepolicy.rule would be copied every boot.

In this way, it can't be correctly judged that the persist partition exists but doesn't mount. The persist partition of the device I mentioned above exists, but it doesn't mount (persist in sbin is an empty folder, which can be copied into the file normally, but still can't be loaded at boot)

kam821 commented 4 years ago

@MlgmXyysd I know the necessary files are copied every boot. I just wanted to tell user that sepolicy.rule copying failed, because for example there is no disk space, that was the only purpose of this modification ;) And give developer the opportunity to do something with it.

enetec1 commented 4 years ago

It should be noted that the standard Magisk installer.sh must be used unless you do not want to publish it to the official repo. Because when using the Magisk Manager to download the module, it will be forced to replace it with the standard Magisk installer.sh.

https://topjohnwu.github.io/Magisk/guides.html#notes

When your module is downloaded with Magisk Manager, update-binary will be forcefully replaced with the latest module_installer.sh to ensure all installer uses up-to-date scripts. DO NOT try to add any custom logic in update-binary as it is pointless.

@MlgmXyysd I didn't know this guideline... I agree with you that if you have to use official installer.sh, probably this should perform more prechecks... (or, at least, report errors during install, while this doesn't happens now!).

Anyway (but probably I should report this on the other ticket...) I still think that a module @ boot time should check if essential parts are in place and if not, not to start or solve different way (and not end in a bootloop...).

Switch selinux to permissive is a solution I use temporary too when needed, BUT I think this couldn't be a solution to be imposed to users if not strictly needed (and without proper warnings...).

More: I tried to use "The selinux switch" apk/zip to solve the issue but it didn't work since it have effect too late during boot... (at the contrary of what reported by some...).

More more: I even tried "Magisk custom sepolicy installer for edxposed" zip but it neither solved bootloops (I've not checked for what reason to be honest, but I guess it had same problem with full /persist partition...)

MlgmXyysd commented 4 years ago

It should be noted that the standard Magisk installer.sh must be used unless you do not want to publish it to the official repo. Because when using the Magisk Manager to download the module, it will be forced to replace it with the standard Magisk installer.sh. https://topjohnwu.github.io/Magisk/guides.html#notes

When your module is downloaded with Magisk Manager, update-binary will be forcefully replaced with the latest module_installer.sh to ensure all installer uses up-to-date scripts. DO NOT try to add any custom logic in update-binary as it is pointless.

@MlgmXyysd I didn't know this guideline... I agree with you that if you have to use official installer.sh, probably this should perform more prechecks... (or, at least, report errors during install, while this doesn't happens now!).

Anyway (but probably I should report this on the other ticket...) I still think that a module @ boot time should check if essential parts are in place and if not, not to start or solve different way (and not end in a bootloop...).

Switch selinux to permissive is a solution I use temporary too when needed, BUT I think this couldn't be a solution to be imposed to users if not strictly needed (and without proper warnings...).

More: I tried to use "The selinux switch" apk/zip to solve the issue but it didn't work since it have effect too late during boot... (at the contrary of what reported by some...).

More more: I even tried "Magisk custom sepolicy installer for edxposed" zip but it neither solved bootloops (I've not checked for what reason to be honest, but I guess it had same problem with full /persist partition...)

SELinux aims to enhance the security of the device. If you want to shut down it must be in the test / development environment.

I don't want to curl closed source privacy tracker(TaiChi)'s hair or comment on his behavior. Maybe this is what he always wanted to do.

The "Magisk custom sepolicy installer for edxposed" you said i guess just put sepolicy.rule to module folder, same as EdXposed's.

If only I had a good interface to detect a SELinux policy (I don't think it's possible)

kam821 commented 4 years ago

@MlgmXyysd I am sick and I just realized that maybe Im writing stupid things. Once again: the problem isnt in copying sepolicy.rule but in fact, that persist mirror isnt mounted?

So maybe its possible to detect it. grep ' /sbin/.magisk/mirror/persist ' /proc/mounts ?

MlgmXyysd commented 4 years ago

@MlgmXyysd I am sick and I just realized that maybe Im writing stupid things. Once again: the problem isnt in copying sepolicy.rule but in fact, that persist mirror isnt mounted?

So maybe its possible to detect it. grep ' /sbin/.magisk/mirror/persist ' /proc/mounts ?

empty if add " | grep -qv 'rootfs'"

the result is "/sbin/.magisk/block/persist /sbin/.magisk/mirror/persist ext4 rw,seclabel,relatime,data=ordered 0 0"

hope this is a feasible way

enetec1 commented 4 years ago

@MlgmXyysd I am sick and I just realized that maybe Im writing stupid things. Once again: the problem isnt in copying sepolicy.rule but in fact, that persist mirror isnt mounted?

So maybe its possible to detect it. grep ' /sbin/.magisk/mirror/persist ' /proc/mounts ?

No, from what I saw on my system, persist mirror was mounted correctly, even with /persist (sdb2) in full disk state...

It was folders of sepolicy rules file that couldn't be created... (nor on installing phase, neither in successive boot... bootlooping).

This is my experience at least...

Following is a short interesting part of Magisk log on bootloops...

01-01 00:14:05.180 989 990 I Magisk : riru_edxposed: loading [system.prop] 01-01 00:14:05.182 989 990 E Magisk : mkdirs /sbin/.magisk/mirror/persist/magisk/riru_edxposed failed with 28: No space left on device 01-01 00:14:05.182 989 990 E Magisk : open: /sbin/.magisk/mirror/persist/magisk/riru_edxposed/sepolicy.rule failed with 2: No such file or directory 01-01 00:14:05.182 989 990 E Magisk : sendfile failed with 9: Bad file descriptor 01-01 00:14:05.182 989 990 I Magisk : riru_edxposed: constructing magic mount structure

skittles9823 commented 4 years ago

@enetec1

01-01 00 14 05.180 989 990 I Magisk : riru_edxposed: loading [system.prop] 01-01 00 14 05.182 989 990 E Magisk : mkdirs /sbin/.magisk/mirror/persist/magisk/riru_edxposed failed with 28: No space left on device 01-01 00 14 05.182 989 990 E Magisk : open: /sbin/.magisk/mirror/persist/magisk/riru_edxposed/sepolicy.rule failed with 2: No such file or directory 01-01 00 14 05.182 989 990 E Magisk : sendfile failed with 9: Bad file descriptor 01-01 00 14 05.182 989 990 I Magisk : riru_edxposed: constructing magic mount structure

I'm not certain, but it think with that error it should be possible for Magisk to disable the module then force another reboot.

enetec1 commented 4 years ago

I'm not certain, but it think with that error it should be possible for Magisk to disable the module then force another reboot.

This could be interesting... I tried some more reboots with no success.

Using Liveboot it's easy to see it goes in a clear bootloop state... (undetected apparently...)

Obviously I could solve bootloop using MM to disable module or even dirty reflashing an EdXposed v.0.4.5.x. which doesn't need policy file to work...

androidacy-user commented 4 years ago

I do think /persist is not a super great choice as on most devices it's 40MB or less and TWRP also likes to store things there. A better place to access pre post-fs-data, short of the ramdisk, I'm not sure.

But really this could be mitigated by: A) checks to ensure /persist has enough space left, and B) logic to remove TWRP related stuffs as necessary (as this seems to be the primary issue here)

enetec1 commented 4 years ago

I do think /persist is not a super great choice as on most devices it's 40MB or less and TWRP also likes to store things there. A better place to access pre post-fs-data, short of the ramdisk, I'm not sure.

I can confirm... 27.5 MB on my Z2 Force, all fully used by TWRP logs... :/

But really this could be mitigated by: A) checks to ensure /persist has enough space left,

Exactly...

and B) logic to remove TWRP related stuffs as necessary (as this seems to be the primary issue here)

Yes. I opened this for TWRP too... https://github.com/TeamWin/android_device_motorola_nash/issues/1 (no reply for now...)

androidacy-user commented 4 years ago

Another issue: deleting a module doesn't seem to remove the file in /persist, whether by deleting the modules folder or via manager. Perhaps magisk should ensure a module is actually installed and if not remove the sepolicy.rule?

skittles9823 commented 4 years ago

@linuxandria I've already raised that issue with John and yea a check will be added.

osm0sis commented 4 years ago

So I'm thinking we should delete the TWRP logs from /persist or /cache to make room during a module install with sepolicy.rule, and then check for space when copying.

Would most want the module install to fail if the sepolicy.rule copy failed? I would think yes, since the module is likely dependent on the changes.

MlgmXyysd commented 4 years ago

So I'm thinking we should delete the TWRP logs from /persist or /cache to make room during a module install with sepolicy.rule, and then check for space when copying.

Would most want the module install to fail if the sepolicy.rule copy failed? I would think yes, since the module is likely dependent on the changes.

That's the first question, and there's another problem behind it: there is no persist partition. @topjohnwu says will add persist fallback behavior: persist is preferred. If it fails, cache will be used. If it still fails, livepatch will be performed

Edit: I suggest that if the copy fails, the sepolicy.rule file under MODDIR should be deleted so that the module developer can judge whether the sepolicy.rule file exists. If not (And it's not a special device like Huawei/Honor/OnePlus7T,7TP), the livepatch script should be executed

MlgmXyysd commented 4 years ago

@MlgmXyysd I am sick and I just realized that maybe Im writing stupid things. Once again: the problem isnt in copying sepolicy.rule but in fact, that persist mirror isnt mounted?

So maybe its possible to detect it. grep ' /sbin/.magisk/mirror/persist ' /proc/mounts ?

Judging from the test returns over a period of time, this is not a good method. Some devices are not loaded correctly (especially Samsung)