topjohnwu / Magisk

The Magic Mask for Android
GNU General Public License v3.0
47.24k stars 11.96k forks source link

VaultKeeper Crashing #8335

Closed diabl0w closed 2 weeks ago

diabl0w commented 3 weeks ago

Device: SM-F926B Android version: 14 Magisk version name: Canary Magisk version code: 27007

Phone is running fine, visually, but in the background Vaultkeeper is just constantly crashing and never stops (will continue to try and increase trycount)

according to this VaultKeeper should be getting disabled, but I am not sure if that means entirety of VaultKeeper or what

08-24 15:06:57.220  5171  5171 I vaultkeeper: There is no property(ro.factory.factory_binary)
08-24 15:06:57.220  5171  5171 I VaultKeeper_COMMON: VK 4.4.1 [Client] /vendor/bin/vaultkeeperd. => [Vault] VK. [CMD NO.] 2040
08-24 15:06:57.220  5171  5171 I VaultKeeper_COMMON: [CMD:INITIALIZE]
08-24 15:06:57.291  5171  5171 E VaultKeeper_COMMON: Occured Error into TA side(-10032//vendor/bin/vaultkeeperd/VK)
08-24 15:06:57.292  5171  5171 E VaultKeeper_COMMON: Failed cmdProcessing(-10032)
08-24 15:06:57.292  5171  5171 E vaultkeeper: runCommand error.(-10032)
08-24 15:06:57.292  5171  5171 E vaultkeeper: There is no VK ID
08-24 15:06:57.292  5171  5171 E vaultkeeper: VaultKeeper service preparation is failed
08-24 15:06:57.672  5216  5216 E CASS_DEBUG: VaultKeeper is not ready. try again Retry cnt(10).
diabl0w commented 3 weeks ago

from ps -A, the parent PID is indeed init that is trying to launch it (PPID = 1 below): 5811 1 1000 5.9 0 20 vaultkeeperd /vendor/bin/vaultkeeperd

and this is the earliest I can find any reference to it in logcat logs (this 912 process starts here):


08-24 22:43:21.120   912   912 I vaultkeeper: There is no property(ro.factory.factory_binary)
08-24 22:43:21.120   912   912 I VaultKeeper_COMMON: VK 4.4.1 [Client] /vendor/bin/vaultkeeperd. => [Vault] VK. [CMD NO.] 8088
08-24 22:43:21.120   912   912 I VaultKeeper_COMMON: [CMD:INITIALIZE]
08-24 22:43:21.121   912   912 D QSEECOMAPI: QSEECom_get_handle sb_length = 0x15c00
08-24 22:43:21.121   912   912 D QSEECOMAPI: App is not loaded in QSEE
08-24 22:43:21.122   912   912 E QSEECOMAPI: Error::Cannot open the file /vendor/firmware_mnt/image/vaultkee.mdt errno = 2
08-24 22:43:21.122   912   912 E QSEECOMAPI: Error::Loading image failed with ret = -1
08-24 22:43:21.122   912   912 D QSEECOMAPI: QSEECom_get_handle sb_length = 0x15c00
08-24 22:43:21.122   912   912 D QSEECOMAPI: App is not loaded in QSEE
08-24 22:43:21.122   912   912 D QSEECOMAPI: app_arch = 1, total_files = 9```
vvb2060 commented 3 weeks ago

https://dumps.tadiphone.dev/dumps/samsung/dm3q/-/blob/dm3qxxx-user-14-UP1A.231005.007-S918BXXS6CXGA-release-keys/vendor/etc/init/vaultkeeper_common.rc VaultKeeper has been moved to the vendor partition for a long time. The code in magisk is out of date and should be removed.

vvb2060 commented 3 weeks ago

/data/adb/service.d/stop_vaultkeeper.sh

stop vaultkeeper
stop vaultkeeper_hal

Try this script, don't forget chmod +x

diabl0w commented 3 weeks ago

/data/adb/service.d/stop_vaultkeeper.sh

stop vaultkeeper
stop vaultkeeper_hal

Try this script, don't forget chmod +x

thanks! although this sort of makes things "worse" in a way, unfortunately... it results in this being spammed every 10 seconds or so:

08-25 00:50:27.092   794  6380 W libc    : Unable to set property "ctl.interface_start" to "vendor.samsung.hardware.security.vaultkeeper@2.0::ISehVaultKeeper/default": error code: 0x20, try count : 1
08-25 00:50:27.093   794  6380 W libc    : Unable to set property "ctl.interface_start" to "vendor.samsung.hardware.security.vaultkeeper@2.0::ISehVaultKeeper/default": error code: 0x20, try count : 2
08-25 00:50:27.095   794  6380 W libc    : Unable to set property "ctl.interface_start" to "vendor.samsung.hardware.security.vaultkeeper@2.0::ISehVaultKeeper/default": error code: 0x20, try count : 3
08-25 00:50:27.096   794  6380 I hwservicemanager: Tried to start vendor.samsung.hardware.security.vaultkeeper@2.0::ISehVaultKeeper/default as a lazy service, but was unable to. Usually this happens when a service is not installed, but if the service is intended to be used as a lazy service, then it may be configured incorrectly.
08-25 00:50:28.089  6309  6339 W HidlServiceManagement: Waited one second for vendor.samsung.hardware.security.vaultkeeper@2.0::ISehVaultKeeper/default
08-25 00:50:28.090   794   794 I hwservicemanager: Since vendor.samsung.hardware.security.vaultkeeper@2.0::ISehVaultKeeper/default is not registered, trying to start it as a lazy HAL (if it's not configured to be a lazy HAL, it may be stuck starting or still starting).
08-25 00:50:28.091  6309  6339 I HidlServiceManagement: getService: Trying again for vendor.samsung.hardware.security.vaultkeeper@2.0::ISehVaultKeeper/default...
vvb2060 commented 3 weeks ago

keep vaultkeeper_hal, only stop vaultkeeper?

diabl0w commented 3 weeks ago

keep vaultkeeper_hal, only stop vaultkeeper?

Something is still trying to call the service

08-25 12:18:57.554 17157 17157 E CASS_DEBUG: VaultKeeper is not ready. try again Retry cnt(24).
08-25 12:18:58.554 17157 17157 E CASS_DEBUG: VaultKeeper is not ready. try again Retry cnt(26).
08-25 12:18:59.555 17157 17157 E CASS_DEBUG: VaultKeeper is not ready. try again Retry cnt(28).
08-25 12:19:00.555 17157 17157 E CASS_DEBUG: Fatal error : VaultKeeper is not running. Exit
08-25 12:19:00.602 17277 17277 E CASS_DEBUG: VaultKeeper is not ready. try again Retry cnt(0).
08-25 12:19:01.603 17277 17277 E CASS_DEBUG: VaultKeeper is not ready. try again Retry cnt(2).
08-25 12:19:02.605 17277 17277 E CASS_DEBUG: VaultKeeper is not ready. try again Retry cnt(4).
08-25 12:19:03.605 17277 17277 E CASS_DEBUG: VaultKeeper is not ready. try again Retry cnt(6).
vvb2060 commented 3 weeks ago

Can you see who is calling VaultKeeper? ps pid

diabl0w commented 3 weeks ago

Can you see who is calling VaultKeeper? ps pid

Those messages are from CASS_DEBUG (which I guess is the Cass init service) . Issue was resolved by issuing stop cass

diabl0w commented 3 weeks ago

Sorry for the spam, but the last thing to add to keep logcat errors completely free is stop proca ... this stops the log from being filled with PA_DAEMON errors.

In summary,

stop vaultkeeper
stop cass
stop proca

These also happen to align with the excellent work by Ian MacDonald seen here

aviraxp commented 3 weeks ago

Are the crashes caused by unlocked bootloader or Magisk?

diabl0w commented 3 weeks ago

Are the crashes caused by unlocked bootloader or Magisk?

I don't know for a fact, but I'm guessing it is due to vbmeta change... and bootloader unlock alone wouldn't cause this. That is only a guess. I suppose you are questioning if this is magisk's responsibility?

aviraxp commented 3 weeks ago

Are the crashes caused by unlocked bootloader or Magisk?

I don't know for a fact, but I'm guessing it is due to vbmeta change... and bootloader unlock alone wouldn't cause this. That is only a guess. I suppose you are questioning if this is magisk's responsibility?

Ye, i am thinking about it

diabl0w commented 3 weeks ago

Are the crashes caused by unlocked bootloader or Magisk?

I don't know for a fact, but I'm guessing it is due to vbmeta change... and bootloader unlock alone wouldn't cause this. That is only a guess. I suppose you are questioning if this is magisk's responsibility?

Ye, i am thinking about it

Not sure, at some point it was decided to get involved https://github.com/topjohnwu/Magisk/commit/7a133eaf03d3ee0acad24c84ff988c22c99caf6d

aviraxp commented 3 weeks ago

I think this can be solved via overlay.d (it provides support for injecting rc files, and you can override the service), but I am not sure how we can handle it in magisk without interfering with other devices.

diabl0w commented 3 weeks ago

I think this can be solved via overlay.d (it provides support for injecting rc files, and you can override the service), but I am not sure how we can handle it in magisk without interfering with other devices.

That's a good idea, thanks. I wonder if handling vaultkeeper this way would change whether cass and proca need to be handled or not. I'll have to check

salvogiangri commented 3 weeks ago

FYI the issue happens in the TZ side of things and it is caused by PROCA whenever a custom image is installed in the device (change of ICD status flag):

[   10.202406] [2:tz_iwlog_thread:  244] SW> VaultKeeper :: VaultKeeper Processing... v4.4.1 (cmd_no : 3655)
[   10.202432] [2:tz_iwlog_thread:  244] SW> [PA_TZ_LIB] [INF] [PrintVersion:286] Version: PA_TZ_lib,teegris,version_2.5.4,release,teamcity@nightly:57
[   10.202440] [2:tz_iwlog_thread:  244] SW> [PA_TZ_LIB] [ERR] [PlatformCallDriver:26] ERROR in driver open![PA_TZ_LIB] [ERR] [PaTzAuthenticateWithRules:189] Communication with PA is failed.
[   10.203607] [5:tz_iwlog_thread:  244] SW> VaultKeeper :: I'm sorry. VaultKeeper is closed today. So pack it up, and get out of here(00100000).
[   10.203620] [5:tz_iwlog_thread:  244] SW> VaultKeeper :: Returning [0xffff8ab0]

A custom kernel with disabled CONFIG_PROCA is able to get around this as the PROCA TA requires and reads its policy in kernel image itself (https://github.com/Mesa-Labs-Archive/android_kernel_samsung_sm7325/blob/e76fd1fdabdca7554e5d7df0dbd10dcbabdfb3fd/arch/arm64/kernel/head.S#L83-L87, https://github.com/Mesa-Labs-Archive/android_kernel_samsung_sm7325/blob/e76fd1fdabdca7554e5d7df0dbd10dcbabdfb3fd/arch/arm64/kernel/image.h#L74), I actually tried to patch a stock image to replace the offset with the special magic but it doesn't seem to work, so something else might be required. Another issue would be how to patch Samsung GKI devices since boot.img isn't touched by Magisk on those...

salvogiangri commented 2 weeks ago

Finally found a way to get rid of this on stock images: https://github.com/Mesa-Labs-Archive/android_kernel_samsung_sm7325/blob/e76fd1fdabdca7554e5d7df0dbd10dcbabdfb3fd/security/samsung/proca/s_os/proca_lsm.c#L33 If /dev/proca_config can't be found then PROCA/FIVE will be disabled in VaultKeeper TA and it will work normally:

08-30 14:49:16.470  1405  4315 I Vaultkeeper_MGR_system: Got service of ISehVaultKeeper
08-30 14:49:16.470   723   723 I VaultKeeper_HIDL_Service: System client (//1000/1405)
08-30 14:49:16.471   722   722 I VaultKeeper_COMMON: VK 4.4.1 [Client] com.samsung.android.rampart. => [Vault] RAMPART. [CMD NO.] 6108
08-30 14:49:16.471   722   722 I VaultKeeper_COMMON: [CMD:CHECK_CLIENT_REGISTERED]
08-30 14:49:16.471   722   722 I vaultkeeper: runCommand Success.(TA ret = 0)
08-30 14:49:16.471   722   722 I VaultKeeper_COMMON: VK 4.4.1 [Client] com.samsung.android.rampart. => [Vault] RAMPART. [CMD NO.] 5905
08-30 14:49:16.471   722   722 I VaultKeeper_COMMON: [CMD:READ_UNSHELTERED]
08-30 14:49:16.568   722   722 I vaultkeeper: runCommand Success.(TA ret = 0)
08-30 14:49:16.569  1405  4315 I Vaultkeeper_MGR_system: ~VaultKeeperManager

I already tried changing the permissions of the dev file itself from 440 to 000 but it wasn't enough, so patching the kernel image to rename "proca_config" is enough to get rid of this.

salvogiangri commented 2 weeks ago

@diabl0w could you please confirm the hex patch works just fine in your end? Install this then patch your stock boot.img with it. app-release.zip

diabl0w commented 2 weeks ago

@diabl0w could you please confirm the hex patch works just fine in your end? Install this then patch your stock boot.img with it. app-release.zip

Trying to install the app says that it is corrupted, I end up getting a placeholder magisk app on my launcher which tries to update itself from the internet when opening

salvogiangri commented 2 weeks ago

Trying to install the app says that it is corrupted, I end up getting a placeholder magisk app on my launcher which tries to update itself from the internet when opening

You need to restore the stock boot.img as this is signed with a different key than the ones the official builds use.

diabl0w commented 2 weeks ago

Trying to install the app says that it is corrupted, I end up getting a placeholder magisk app on my launcher which tries to update itself from the internet when opening

You need to restore the stock boot.img as this is signed with a different key than the ones the official builds use.

proca_magisk works great ;) nice job!

diabl0w commented 2 weeks ago

and just to be more clear, yes this completely resolves the issue, i no longer get any of the logcat errors from my original post nor do i get any other related errors

aviraxp commented 2 weeks ago

Finally found a way to get rid of this on stock images: https://github.com/Mesa-Labs-Archive/android_kernel_samsung_sm7325/blob/e76fd1fdabdca7554e5d7df0dbd10dcbabdfb3fd/security/samsung/proca/s_os/proca_lsm.c#L33 If /dev/proca_config can't be found then PROCA/FIVE will be disabled in VaultKeeper TA and it will work normally:

08-30 14:49:16.470  1405  4315 I Vaultkeeper_MGR_system: Got service of ISehVaultKeeper
08-30 14:49:16.470   723   723 I VaultKeeper_HIDL_Service: System client (//1000/1405)
08-30 14:49:16.471   722   722 I VaultKeeper_COMMON: VK 4.4.1 [Client] com.samsung.android.rampart. => [Vault] RAMPART. [CMD NO.] 6108
08-30 14:49:16.471   722   722 I VaultKeeper_COMMON: [CMD:CHECK_CLIENT_REGISTERED]
08-30 14:49:16.471   722   722 I vaultkeeper: runCommand Success.(TA ret = 0)
08-30 14:49:16.471   722   722 I VaultKeeper_COMMON: VK 4.4.1 [Client] com.samsung.android.rampart. => [Vault] RAMPART. [CMD NO.] 5905
08-30 14:49:16.471   722   722 I VaultKeeper_COMMON: [CMD:READ_UNSHELTERED]
08-30 14:49:16.568   722   722 I vaultkeeper: runCommand Success.(TA ret = 0)
08-30 14:49:16.569  1405  4315 I Vaultkeeper_MGR_system: ~VaultKeeperManager

I already tried changing the permissions of the dev file itself from 440 to 000 but it wasn't enough, so patching the kernel image to rename "proca_config" is enough to get rid of this.

When did you chmod the node? I think we need to do it on fs stage. You can change it here https://github.com/topjohnwu/Magisk/blob/8e82113bce1486c0dce7d8d1fdc08a1372ab26a6/native/src/init/rootdir.rs and see if it works.

salvogiangri commented 2 weeks ago

When did you chmod the node? I think we need to do it on fs stage. You can change it here https://github.com/topjohnwu/Magisk/blob/8e82113bce1486c0dce7d8d1fdc08a1372ab26a6/native/src/init/rootdir.rs and see if it works.

Doesn't seem to work regardless of which stage is used to change the file permissions:

08-31 17:49:25.366  3562  3562 F PA_DAEMON: [PrintVersion:605] Version: PA_Daemon,qsee,version_2.6.5,release,dpi
08-31 17:49:25.366  3562  3562 E PA_DAEMON: [ReadConfig:335] Cannot open config file.
08-31 17:49:25.366  3562  3562 E PA_DAEMON: [main:225] PA daemon has not sent the config to the driver.

I've also tried to change file owner to root but it had no effects.