google / agi

Android GPU Inspector
https://gpuinspector.dev
Apache License 2.0
954 stars 140 forks source link

Samsung devices w/ Qualcomm GPU fails validation (needs to be rooted to access GPU counters) #1113

Open EricAtPlanGrid opened 2 years ago

EricAtPlanGrid commented 2 years ago

Environment information:

Bug description

Device validation fails: Check failed for counter: {1 Clocks / Second 0x46e6a20}

Reproduction steps Steps to reproduce the behavior: Attempt to use AGI w/ these devices

Additional debugging information

ttanatb commented 2 years ago

Hello, thanks for filing this issue!

This part of the validation process can be non-deterministic. Please retry while making sure that, the screen remains unlocked and the sample application remains in focus. If this continues to persists, please upload the trace file (from the 'View trace file' link)

Screenshot of validation window with 'View trace file'

alusch commented 2 years ago

Hi @ttanatb, I can consistently repro this as well (S21+) with the screen unlocked and the sample app visible until the validation fails. Here's my trace:

validation4229315264.perfetto.zip

ttanatb commented 2 years ago

Looking through the logcat it looks like there's an issue with getting counter values from the driver as everything is set to 0 in the perfetto trace.

Are you by chance using a custom build or a preview build of the OS? Or was the validation working correctly earlier and only recently started breaking?

alusch commented 2 years ago

Thanks @ttanatb. It's just the normal official OS build 🤔 I also tried AGI 2.1.0, which failed, too. That definitely worked a month or two ago on this device, so I suppose the only thing that could have changed is an OS update, the last of which I installed on May 17. Is there a way to get in touch with Samsung to see if something changed with the driver?

ttanatb commented 2 years ago

It looks like there was a breaking change in the driver recently (not sure if it was included in the May 17 update or the one before that). Unfortunately, reverting the AGI version will not fix this issue, and I don't have an easy workaround for this at the moment. We're working on getting this sorted out and will update once we have something.

ttanatb commented 2 years ago

Apologies for the delay, here's a workaround to get around issues with getting GPU counters on Adreno GPUs:

adb shell "echo 1 > /sys/class/kgsl/kgsl-3d0/perfcounter"

It's worth noting that you may want to set the contents of /sys/class/kgsl/kgsl-3d0/perfcounter back to 0 as a precaution after profiling, so that other apps/processes cannot access GPU counters.

I'm working on a change that runs these commands before profiling, will notify once it's in the new release.

Mikayex commented 2 years ago

I have also the same problem on a Galaxy Tab S8 (SM-X700, Android 12, OneUI 4.1, build SP1A.210812.016.X700XXU2AVF5). I can't apply the workaround since the device is not rooted (and I don't want to) and /sys/class/kgsl/kgsl-3d0/perfcounter is only writable by root. So I guess I'll have to wait for driver update...

ttanatb commented 2 years ago

I see, I've verified that /sys/class/kgsl/kgsl-3d0/perfcounter does not require root access on pixel devices, will try to contact folks on Samsung to try to resolve this, but cannot comment on the timeline or feasibility of this. Apologies for the breakage and lack of convenient workaround, but we're working to ensure that we don't end up with a breaking change like this in the future.

JakeDaynes commented 2 years ago

Adding in here, ASUS devices (ROG 6 and Zenfone 9) are also affected.

ttanatb commented 2 years ago

Hi @JakeDaynes have you tried this with the latest dev release https://github.com/google/agi-dev-releases/releases/tag/v3.1.0-dev-20220830 and just to confirm, are you able to run adb shell "echo 1 > /sys/class/kgsl/kgsl-3d0/perfcounter" or is the access to the file gated by root (instead of shell)?

If the file access is set to root, we'll have to ask ASUS to release a patch to fix this

JakeDaynes commented 2 years ago

@ttanatb Sorry for any confusion - I was more commenting on the fact that the ASUS devices are affected by the kgsl-3d0/perfcounter file being gated by root. I've gotten in contact with ASUS over the issue, but having Google reach out and push for a patch would help.

ttanatb commented 2 years ago

Thanks for the clarification -- just wanted to make sure because I couldn't get a hold of an ASUS device to repro. I'm working with the rest of the team to push for a resolution on this

The team is also working on efforts to prevent this in the future by tightening our test suites to ensure that a patch will not break our product like this in the future. Apologies for the inconvenience

ttanatb commented 2 years ago

@JakeDaynes Could you provide your trace file (the *.perfetto file that is linked in the trace dialog) for me to double-check as well? Would like to avoid case in which it happens to be something else -- Thanks!

JakeDaynes commented 2 years ago

All good - kgsl drivers/counters have been a nightmare since A7, so wasn't really surprised :)

I'm actually not using AGI (I've got my own IOCTL communication to the driver), but I've confirmed that Snapdragon Profiler is also blocked by this root gate issue as well

Mikayex commented 2 years ago

Just got an update on my Samsung tablet (SM-X700, Android 12, OneUI 4.1.1, build SP2A.220305.013.X700XXU2AVH4) and /sys/class/kgsl/kgsl-3d0/perfcounter is now writable by shell group! Device validation now passes after enabling counters :smiley:

Cliwo commented 2 years ago

Apologies for the delay, here's a workaround to get around issues with getting GPU counters on Adreno GPUs:

adb shell "echo 1 > /sys/class/kgsl/kgsl-3d0/perfcounter"

It's worth noting that you may want to set the contents of /sys/class/kgsl/kgsl-3d0/perfcounter back to 0 as a precaution after profiling, so that other apps/processes cannot access GPU counters.

I'm working on a change that runs these commands before profiling, will notify once it's in the new release.

Working like a charm. My Note S20 (Qualcomm Adreno (TM) 650, Android 12) did not pass the validation test before using this command.

But another error comes out after this. (Image below)

image

Can this be related to this problem?

ttanatb commented 2 years ago

@Cliwo that error looks like your adb connection might've been severed (i.e. faulty cable or locked phone screen), if the issue continues to persist (i.e. not a flakey issue) please file a separate issue.

I'm closing this for now because the update seems to have landed and we can finally look at wiggly little GPU counter numbers again 🥳 Future versions of AGI will run the adb shell "echo 1 > /sys/class/kgsl/kgsl-3d0/perfcounter" command for you and I'm working to hopefully make sure that future versions of Android will not have this same issue again.

Roman-Skabin commented 1 year ago

Same problem with S21 FE 5G (SM-G990B2)

adb shell "echo 1 > /sys/class/kgsl/kgsl-3d0/perfcounter" fails with /system/bin/sh: can't create /sys/class/kgsl/kgsl-3d0/perfcounter: Permission denied

CPU: Qualcomm Snapdragon 888 5G GPU: Adreno 660 Android: 13 Build: TP1A.220624.014.G990B2XXS1DVL1 UI: OneUI 5.0

Host OS: Windows 10 Enterprise 21H2 AGI version: 2022-3.2.1:d808a5ebcdca788a1bd50d105cc21f9888074d4d

ttanatb commented 1 year ago

Apologize for the delay in getting this resolved, I've communicated the issue with folks at Samsung and am hoping to get this resolved as soon as possible.

In the future, we've added tests in Android 14 to ensure that we don't get this kind of regression again.

JakeDaynes commented 1 year ago

Honestly a standardised interface with an available counter map would be a godsend in future versions of the OS. One of Android's basic tenants is transparency, yet over the last 10 major releases I've been doing performance profiling development, we've seen access to certain information killed or modified heavily time and time again.

Really appreciate you working towards keeping this open at least.

phx-ddevoe commented 1 year ago

Thank you soo much. Is there any way to follow along with Samsung's side of this bug?

ttanatb commented 1 year ago

I'm not a certain of a way to track this issue publicly, but we've communicated this problem to them via internal channels previously. I can try to ping this issue again to try to understand the timeline on getting this fixed.

EricAtPlanGrid commented 1 year ago

I don't want to spam this thread (as I spam this thread) but I do want to add my voice here to say that many of us would benefit from this being resolved! Thank you for your hard work @ttanatb et al!

ttanatb commented 1 year ago

Pinged the folks at Samsung -- will update this thread once I receive a response

chakibchemso commented 1 year ago

and still, they didn't fix this!

EricAtPlanGrid commented 1 year ago

Dare I ask, has there been any further contact with Samsung?

joshuawilde commented 1 year ago

Any updates here @ttanatb ?

ppamorim commented 7 months ago

Any updates?

EricAtPlanGrid commented 2 months ago

🍾 (1 year bump-aversary) … any news?