sailfishos-open / waydroid

Waydroid packaging for Sailfish OS
34 stars 8 forks source link

Waydroid doesn't start at all on Sony XA2 #20

Open DIzFer opened 1 year ago

DIzFer commented 1 year ago

First of all, I'm likely duplicating https://github.com/sailfishos-open/waydroid/issues/15, https://github.com/sailfishos-open/waydroid/issues/7, and https://github.com/sailfishos-open/waydroid/issues/14 at least, but those aren't exactly drowning with information so I can't really be sure.

I'm using (or trying to use) the official Sailfish X port for the Xperia XA2. I've installed waydroid-runner and waydroid-gbinder-config-hybris, which I believe to be the right version.

This, as expected, puts a Waydroid icon into the application list.

Symptoms

Similarly, and as suggested in one of the other issues, using fingerterm to run waydroid session start and waydroid show-full-ui spams the same message in the terminal where show-full-ui is run.

The terminal where waydroid session start is run pops the XDG Session is not "wayland" message. I assume it's benign, but is still worthy of a raised eyebrow. I would appreciate clarification. It would be great to fix the clipboard too, but I assume it's not as simple as installing pyclip and unrelated in any case.

What I tried

Logs

I'm particularly suspicious of this, but I don't know what to do about it:

New background process: pid=13565, output=background
lxc-start: waydroid: utils.c: lxc_can_use_pidfd: 1772 Kernel does not support pidfds
lxc-start: waydroid: utils.c: safe_mount: 1198 Not a directory - Failed to mount "/dev/video" onto "/usr/lib/lxc/rootfs     /dev/video"

In any case, the full log from /var/lib/waydroid/waydroid.log. Includes a bit of noise at the start, didn't want to remove anything just in case.

Also uncompressed /proc/config.gz, renamed to config.txt because github is being silly.

rinigus commented 1 year ago

Not sure how up to date porters tasks are for Waydroid. That /dev/video was in #17 as well, not sure how relevant it is. Maybe @piggz has some ideas?

piggz commented 1 year ago

I would use "waydroid logcat" to look at the logcat logs from within the container to see where it is stuck. I helped a case recently where permissions were incorrect for the binder devices, and that was evident in the logs.

DIzFer commented 1 year ago

Ah, nice, github notified me of rinigus' reply but not piggz.

Anyways, of course I was missing a logging source :P

logcat

Highlights (although I don't know why I bother because I have a hard time interpreting these...):

E lowmemorykiller: kernel lmk event file could not be opened; errno=2
I lowmemorykiller: Process polling is not supported
E ProcessState: Binder ioctl to become context manager failed: Device or resource busy
E ProcessState: Binder ioctl to become context manager failed: Device or resource busy
E hw-ProcessState: Binder ioctl to become context manager failed: Device or resource busy
E vold    : Failed to open device-mapper: No such file or directory
E vold    : DM_LIST_DEVICES failed: Bad file descriptor
E vold    : Failed to get dm devices
E vold    : Failed to opendir: No such file or directory
E vold    : Unable to start VoldNativeService
W init    : Couldn't load property file '/vendor_extra/build.prop': open() failed: No such file or directory: No such file or direct
W init    : Couldn't load property file '/factory/factory.prop': open() failed: No such file or directory: No such file or directory
I init    : Unable to read config file '/init.pioneer.rc': open() failed: No such file or directory
I init    : Parsing file /vendor/etc/init/hw/init.pioneer.rc...
I init    : Unable to read config file '/vendor/etc/init/hw/init.pioneer.rc': open() failed: No such file or directory

There's a random scattering of "reasonable" enofiles throughout. Reasonable because they really do not exist...

This is the only eperm I find:

I init    : Command 'write /dev/blkio/blkio.weight 1000' action=init (/system/etc/init/hw/init.rc:164) took 1ms and failed: Unable to write to file '/dev/blkio/blkio.weight': open() failed: Permission denied
I init    : Command 'write /dev/blkio/background/blkio.weight 200' action=init (/system/etc/init/hw/init.rc:165) took 0ms and failed: Unable to write to file '/dev/blkio/background/blkio.weight': open() failed: Permission denied
I init    : Command 'write /dev/blkio/blkio.group_idle 0' action=init (/system/etc/init/hw/init.rc:166) took 0ms and failed: Unable to write to file '/dev/blkio/blkio.group_idle': open() failed: Permission denied
I init    : Command 'write /dev/blkio/background/blkio.group_idle 0' action=init (/system/etc/init/hw/init.rc:167) took 0ms and failed: Unable to write to file '/dev/blkio/background/blkio.group_idle': open() failed: Permission denied

Then E ## mmc1 : gpio_set failure: ctx=fffffff730e2e118, uim2_gpio=-22 appears periodically.

I believe I didn't mention: I'm running an official port, downloaded directly from Jolla. However, I've not installed Jolla's android support. I've never inspected that to know if that's where the missing files are.

piggz commented 1 year ago

I cant really tell from the logs.

Other things to try/be aware of. Make sure the correct binder nodes are being used (/dev/puddlejumper maybe?) and make sure those are configured correctly in the waydroid nodes file (/var/lib/waydroid/lcx/config_nodes ?)

Also, are you using waydroid 17 or 18? For 18, there were some commits which broke waydroid on our older wayland version. Those have now been fixed, so any new vendor.img files as of today should be ok and may be worth a try.

DIzFer commented 1 year ago

Everything in config_nodes exists, and I don't see /dev/puddlejumper neither configured not under /dev.

Not sure if that's what I should have checked there :/

Anyways, I ran waydroid init -f and it seems the image it downloaded was indeed newer.

I'm noticing I might have been too impatient with the previous logcat? I'm now noticing more/different errors:

E libprocessgroup: Failed to make and chown /acct/uid_1000: Read-only file system

11-28 16:45:07.447 0 0 E libprocessgroup: Failed to make and chown /acct/uid_1000: Read-only file system 11-28 16:45:07.447 0 0 E init : createProcessGroup(1000, 1050) failed for service 'vendor.health-hal-2-0': Read-only file system 11-28 16:45:07.457 1045 1045 E LegacySupport: Could not register service android.hardware.drm@1.0::IDrmFactory/default (-2147483648). 11-28 16:45:07.458 1045 1045 F android.hardware.drm@1.0-service: Error while registering drm service: -2147483648 11-28 16:45:07.459 1048 1048 D vndksupport: Loading /vendor/lib64/hw/gatekeeper.waydroid.so from current namespace instead of sphal namespace. 11-28 16:45:07.459 124 124 I tombstoned: received crash request for pid 1045 11-28 16:45:07.459 124 124 I tombstoned: enqueueing crash request for pid 1045



[new logcat](https://github.com/sailfishos-open/waydroid/files/10105858/logcat.txt)

I still have difficulty getting anything useful out of that though.
piggz commented 1 year ago

can you provide output of zcat /proc/config.gz | grep BINDER

DIzFer commented 1 year ago
$ zcat /proc/config.gz | grep BINDER
CONFIG_ANDROID_BINDER_IPC=y
CONFIG_ANDROID_BINDER_DEVICES="binder,vndbinder,hwbinder,puddlejumper,vndpuddlejumper,hwpuddlejumper"
# CONFIG_ANDROID_BINDER_IPC_SELFTEST is not set

Everything exists in /dev too

Full config_nodes attached, where no puddlejumper can be seen.

config_nodes.txt

piggz commented 1 year ago

These lines are wrong lxc.mount.entry = /dev/binder dev/binder none bind,create=file,optional 0 0 lxc.mount.entry = /dev/vndbinder dev/vndbinder none bind,create=file,optional 0 0 lxc.mount.entry = /dev/hwbinder dev/hwbinder none bind,create=file,optional 0 0

should be

lxc.mount.entry = /dev/puddlejumper dev/binder none bind,create=file,optional 0 0 lxc.mount.entry = /dev/vndpuddlejumper dev/vndbinder none bind,create=file,optional 0 0 lxc.mount.entry = /dev/hwpuddlejumper dev/hwbinder none bind,create=file,optional 0 0

DIzFer commented 1 year ago

It's not quite a success, but the failure state has definitely changed, which is a win in my debugging experience :P

I init : Reboot ending, jumping to kernel, then the logcat ends, and "Waydroid container is STOPPED".

I seem to be getting better at running the logcat command sooner in the boot process, because this is the first time I catch "beginning of main", and, 13 lines later, "beginning of crash"

--------- beginning of main
11-28 19:11:24.693    20    20 I lowmemorykiller: Using in-kernel low memory killer interface
11-28 19:11:24.693    20    20 E lowmemorykiller: kernel lmk event file could not be opened; errno=2
11-28 19:11:24.693    20    20 I lowmemorykiller: Process polling is not supported
11-28 19:11:24.704    23    23 E ProcessState: Binder driver /dev/vndbinder is unavailable. Using /dev/binder instead.
11-28 19:11:24.706    21    21 E ProcessState: Binder driver /dev/binder is unavailable. Using /dev/binder instead.
11-28 19:11:24.715    23    23 W ProcessState: Opening '/dev/binder' failed: Permission denied
11-28 19:11:24.715    21    21 W ProcessState: Opening '/dev/binder' failed: Permission denied
11-28 19:11:24.715    23    23 F ProcessState: Binder driver '/dev/binder' could not be opened.  Terminating.
11-28 19:11:24.715    21    21 F ProcessState: Binder driver '/dev/binder' could not be opened.  Terminating.
11-28 19:11:24.716    23    23 E libc    : failed to connect to tombstoned: No such file or directory
11-28 19:11:24.716    21    21 E libc    : failed to connect to tombstoned: No such file or directory

/dev/binder is root:root 666, and the three puddlejumpers are root:root 600

As always, logcat

DIzFer commented 1 year ago

Since I'm a brute anyway I chmod 666'd the three puddlejumper devices and it seems the loop now mentions ServiceManager: Waiting for service 'SurfaceFlinger' on '/dev/binder'.

piggz commented 1 year ago

So, the puddlejumper nodes are mounted to the binder nodes inside the lxc container.

What release of waydroid is this? 17 or 18? can you send the full logcat, there may be surfaceflinger crashes somewhere in there.

Also, you would need to stop/start the container (or reboot) but i guess you did that already.

DIzFer commented 1 year ago

ffs leave it to me to forget immediately after saying always -.-

logcat

I fully rebooted after modifying config_nodes, and I tested the chmods by restarting waydroid-container.service

piggz commented 1 year ago

Ok, now I think you are onto the problem I had on the pro1-x hwcomposer.waydroid.zip

Download the attached Loop mount the vendor.img rw to a temp dir Rename lib64/hw/hwcomposer.waydroid.so to something like .old Copy the attached file into that folder Unmount the image Start the container.

DIzFer commented 1 year ago

Welp that's a tight fit, but a resize2fs later the logcat no longer exceeds legal speed limits :)

It still doesn't quite boot, but I can now more easily identify at least two big log-hogs.

First, ueventd keeps complaining it cannot chmod nor chown /sys/devices/system/cpu/cpu?/cpufreq/scaling_{min,max}_freq. Not sure that should necessarily mean a crash but I've seen android do weirder stuff.

11-28 22:16:56.464     0     0 E ueventd : chown(/sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq, 1000, 1000) failed: Read-only file system
11-28 22:16:56.464     0     0 E ueventd : chmod(/sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq, 436) failed: Read-only file system
11-28 22:16:56.465     0     0 E ueventd : chown(/sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq, 1000, 1000) failed: Read-only file system
11-28 22:16:56.466     0     0 E ueventd : chmod(/sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq, 436) failed: Read-only file system
11-28 22:16:56.587     0     0 E ueventd : chown(/sys/devices/system/cpu/cpu2/cpufreq/scaling_max_freq, 1000, 1000) failed: Read-only file system
11-28 22:16:56.587     0     0 E ueventd : chmod(/sys/devices/system/cpu/cpu2/cpufreq/scaling_max_freq, 436) failed: Read-only file system
11-28 22:16:56.587     0     0 E ueventd : chown(/sys/devices/system/cpu/cpu2/cpufreq/scaling_min_freq, 1000, 1000) failed: Read-only file system
11-28 22:16:56.587     0     0 E ueventd : chmod(/sys/devices/system/cpu/cpu2/cpufreq/scaling_min_freq, 436) failed: Read-only file system
11-28 22:16:57.340     0     0 E ueventd : chown(/sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq, 1000, 1000) failed: Read-only file system
11-28 22:16:57.341     0     0 E ueventd : chmod(/sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq, 436) failed: Read-only file system
11-28 22:16:57.342     0     0 E ueventd : chown(/sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq, 1000, 1000) failed: Read-only file system
11-28 22:16:57.342     0     0 E ueventd : chmod(/sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq, 436) failed: Read-only file system
11-28 22:16:58.217     0     0 E ueventd : chown(/sys/devices/system/cpu/cpu2/cpufreq/scaling_max_freq, 1000, 1000) failed: Read-only file system
11-28 22:16:58.217     0     0 E ueventd : chmod(/sys/devices/system/cpu/cpu2/cpufreq/scaling_max_freq, 436) failed: Read-only file system
11-28 22:16:58.217     0     0 E ueventd : chown(/sys/devices/system/cpu/cpu2/cpufreq/scaling_min_freq, 1000, 1000) failed: Read-only file system
11-28 22:16:58.217     0     0 E ueventd : chmod(/sys/devices/system/cpu/cpu2/cpufreq/scaling_min_freq, 436) failed: Read-only file system
11-28 22:16:58.341     0     0 E ueventd : chown(/sys/devices/system/cpu/cpu3/cpufreq/scaling_max_freq, 1000, 1000) failed: Read-only file system
11-28 22:16:58.341     0     0 E ueventd : chmod(/sys/devices/system/cpu/cpu3/cpufreq/scaling_max_freq, 436) failed: Read-only file system
11-28 22:16:58.341     0     0 E ueventd : chown(/sys/devices/system/cpu/cpu3/cpufreq/scaling_min_freq, 1000, 1000) failed: Read-only file system
11-28 22:16:58.341     0     0 E ueventd : chmod(/sys/devices/system/cpu/cpu3/cpufreq/scaling_min_freq, 436) failed: Read-only file system

Second, right after it finishes failing to set cpu frequencies, it fails to start vold:

11-28 22:16:53.953     0     0 I init    : starting service 'vold'...
11-28 22:16:53.957     0     0 E libprocessgroup: Failed to make and chown /acct/uid_0: Read-only file system
11-28 22:16:53.958     0     0 E init    : createProcessGroup(0, 79) failed for service 'vold': Read-only file system
11-28 22:16:53.958     0     0 E init    : failed to write pid to files: couldn't write 79 to /dev/cpuset/foreground/tasks: No space left on device
11-28 22:16:54.044    79    79 I vold    : Vold 3.0 (the awakening) firing up
11-28 22:16:54.045    79    79 D vold    : Detected support for: ext4 f2fs ntfs vfat
11-28 22:16:54.049    79    79 E vold    : Failed to open device-mapper: No such file or directory
11-28 22:16:54.050    79    79 E vold    : DM_LIST_DEVICES failed: Bad file descriptor
11-28 22:16:54.050    79    79 E vold    : Failed to get dm devices
11-28 22:16:54.050    79    79 E vold    : Failed to opendir: No such file or directory
11-28 22:16:54.052    79    79 E vold    : Unable to start VoldNativeService
11-28 22:16:54.016     0     0 I init    : Service 'vold' (pid 79) exited with status 1

I want to say there's something else intertwined in there but beyond the debug info dump I cannot really get anything clearer. For example, it seems tombstoned fails to start (but I only find references to it failing to start, not the failure itself), and HidlServiceManagement: Service android.hardware.keymaster@4.0::IKeymasterDevice/default must be in VINTF manifest in order to register/get. keeps appearing too.

logcat.txt

DIzFer commented 1 year ago

I just realised I only joked about the resize2fs without fully explaining:

I couldn't even rewrite the file in-place without a "no space left on device" error. I resized the image to 250M, which also forced me to run an e2fsck -f.

I'D HOPE that would not break anything, but I don't trust computers that much.