sonyxperiadev / bug_tracker

Empty repository that is used as a bugtracker for Open Devices project
52 stars 13 forks source link

[Yoshino] Fingerprint build broken #503

Closed stefanhh0 closed 4 years ago

stefanhh0 commented 4 years ago

When building yoshino, build runs into following error:

[ 40% 49512/121165] target  C++: android.hardware.biometrics.fingerprint@2.1-service.sony <= vendor/oss/fingerprint/BiometricsFingerprint.cpp
FAILED: out/target/product/lilac/obj/EXECUTABLES/android.hardware.biometrics.fingerprint@2.1-service.sony_intermediates/BiometricsFingerprint.o
/bin/bash -c "PWD=/proc/self/cwd /usr/bin/ccache prebuilts/clang/host/linux-x86/clang-r353983c/bin/clang++      -I vendor/oss/fingerprint -I out/target/product/lilac/obj/EXECUTABLES/android.hardware.biometrics.fingerprint@2.1-service.sony_intermediates -I out/target/product/lilac/gen/EXECUTABLES/android.hardware.biometrics.fingerprint@2.1-service.sony_intermediates \$(cat out/target/product/lilac/obj/EXECUTABLES/android.hardware.biometrics.fingerprint@2.1-service.sony_intermediates/import_includes)   -isystem out/target/product/lilac/obj/include -isystem kernel/sony/msm-4.9/common-headers/kernel-headers -c  -Werror=implicit-function-declaration -DANDROID -fmessage-length=0 -W -Wall -Wno-unused -Winit-self -Wpointer-arith -no-canonical-prefixes -DNDEBUG -UDEBUG -fno-exceptions -Wno-multichar -O2 -g -fno-strict-aliasing -fdebug-prefix-map=/proc/self/cwd= -D__compiler_offsetof=__builtin_offsetof -faddrsig -Wimplicit-fallthrough -Werror=int-conversion -Wno-reserved-id-macro -Wno-format-pedantic -Wno-unused-command-line-argument -fcolor-diagnostics -Wno-zero-as-null-pointer-constant -Wno-sign-compare -Wno-defaulted-function-deleted -Wno-inconsistent-missing-override -ffunction-sections -fdata-sections -fno-short-enums -funwind-tables -fstack-protector-strong -Wa,--noexecstack -D_FORTIFY_SOURCE=2 -Wstrict-aliasing=2 -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Werror=date-time -Werror=format-security -nostdlibinc -march=armv8-a -mcpu=cortex-a53 -target aarch64-linux-android -Bprebuilts/gcc/linux-x86/aarch64/aarch64-linux-android-4.9/aarch64-linux-android/bin  -Wsign-promo -D_LIBCPP_ENABLE_THREAD_SAFETY_ANNOTATIONS -Wno-thread-safety-negative -Wno-gnu-include-next -fvisibility-inlines-hidden  -std=gnu++17   -fno-rtti -Wall -DUSE_FPC_YOSHINO -DPLATFORM_SDK_VERSION=29 -fexceptions -std=c++1z -D__ANDROID_API__=29 -D__ANDROID_VNDK__ -fPIE -D_USING_LIBCXX -DANDROID_STRICT   -Werror=int-to-pointer-cast -Werror=pointer-to-int-cast -Werror=address-of-temporary -Werror=return-type -Wno-tautological-constant-compare -Wno-tautological-type-limit-compare -Wno-tautological-unsigned-enum-zero-compare -Wno-tautological-unsigned-zero-compare -Wno-c++98-compat-extra-semi -Wno-return-std-move-in-c++11  -MD -MF out/target/product/lilac/obj/EXECUTABLES/android.hardware.biometrics.fingerprint@2.1-service.sony_intermediates/BiometricsFingerprint.d -o out/target/product/lilac/obj/EXECUTABLES/android.hardware.biometrics.fingerprint@2.1-service.sony_intermediates/BiometricsFingerprint.o vendor/oss/fingerprint/BiometricsFingerprint.cpp"
vendor/oss/fingerprint/BiometricsFingerprint.cpp:534:57: error: too many arguments to function call, expected single argument 'gid', have 2 arguments
                int grp_err = __setActiveGroup(mDevice, gid);
                              ~~~~~~~~~~~~~~~~          ^~~
vendor/oss/fingerprint/BiometricsFingerprint.cpp:216:1: note: '__setActiveGroup' declared here
int BiometricsFingerprint::__setActiveGroup(uint32_t gid) {
^
1 error generated.
09:06:11 ninja failed with: exit status 1

Removing the mDevice parameter should fix the build error. Problem is caused by https://github.com/sonyxperiadev/vendor-sony-oss-fingerprint/commit/75d6aa806f1ebf33252a87f30060411c8a1360b1

MarijnS95 commented 4 years ago

Ahhh, a codepath I have never tested.

But really interesting nevertheless. How did you manage to build and test this before?

Feel free to PR the change and claim the fame/contribution for finding and fixing it :)

MarijnS95 commented 4 years ago

This path is up for discussion anyway. Tama ran into a similar issue, and it turned out to be a missing function call: https://github.com/sonyxperiadev/vendor-sony-oss-fingerprint/commit/8ad3ab99439ffac75167c560d808cbd0393fd8f1

Now that that's fixed, I assume this bad behaviour is resolved on Yoshino as well. Would you mind messing about with the sensor and see if you can get it to reinit the tzapp? fpc_close and fpc_init should log something, but you can also add a log message (an obvious one like ALOGE("REINITIALIZING TZAPP!!")) to this particular section.

I don't know the reproduction steps as I wasn't involved with the project when this happened, but it would definitely be interesting to revert the fix (or the PR containing the fix entirely) and see if it's still an issue.

stefanhh0 commented 4 years ago

But really interesting nevertheless. How did you manage to build and test this before?

Since you have seen fingerptint-gestures related stuff in the log, I can only guess that I have synced while the merge of https://github.com/sonyxperiadev/vendor-sony-oss-fingerprint/pull/50 was in progress. It is not an isolated transaction when merging PRs directly on github, is it? Or https://github.com/sonyxperiadev/vendor-sony-oss-fingerprint/pull/50 wasn't even there in the binaries from the unstable build and something different caused the instability.

Anyway that means I have to retest with activated fingerprint-gestures.

MarijnS95 commented 4 years ago

It is not an isolated transaction when merging PRs directly on github, is it?

It definitely is, otherwise everything would go corrupt. It can't really be anything else: Given all the referenced objects are available in the object store of a git repository, all a merge does is create a new commit and replace the branch pointer to point to this new commit. If you sync, you either see the old hash and do nothing (assume that object and thus everything it points to are on your system), or the new hash and download all the objects associated with it.

I just checked the logcat from #502 again, and there are definitely gesture logs. Could this possibly be from an earlier fetch, back when you were experimenting with Android 10 and pulled all my PRs?

stefanhh0 commented 4 years ago

This path is up for discussion anyway. Tama ran into a similar issue, and it turned out to be a missing function call: sonyxperiadev/vendor-sony-oss-fingerprint@8ad3ab9

Now that that's fixed, I assume this bad behaviour is resolved on Yoshino as well. Would you mind messing about with the sensor and see if you can get it to reinit the tzapp? fpc_close and fpc_init should log something, but you can also add a log message (an obvious one like ALOGE("REINITIALIZING TZAPP!!")) to this particular section.

I don't know the reproduction steps as I wasn't involved with the project when this happened, but it would definitely be interesting to revert the fix (or the PR containing the fix entirely) and see if it's still an issue.

I guess I have to register a fingerprint for that, correct? Currently I have none registered because of #442. However I can do so for this to test.

From looking at the code https://github.com/sonyxperiadev/vendor-sony-oss-fingerprint/blob/e8602743dead6296028c70e6d679731b53124053/BiometricsFingerprint.cpp#L533

#ifdef USE_FPC_YOSHINO
                int grp_err = __setActiveGroup(mDevice, gid);
                if (grp_err)
                    ALOGE("%s : Cannot reinitialize database", __func__);
#else
                // Break out of the loop, and make sure ERROR_HW_UNAVAILABLE
                // is raised afterwards, similar to the stock hal:
                status = -1;
                break;
#endif

I ask myself if the USE_FPC_YOSHINO branch is correct, shouldn't there be a break and maybe even status = -1 as well, or will the loop naturally end even without a break. Not knowing enough from the code to judge + not so familiar with cpp... It just looks a asymmetric to me.

stefanhh0 commented 4 years ago

I just checked the logcat from #502 again, and there are definitely gesture logs. Could this possibly be from an earlier fetch, back when you were experimenting with Android 10 and pulled all my PRs?

I haven't fetched anything special from fingerprint on the build from 03.12. I am always using https://github.com/stefanhh0/aosp-10/blob/master/build-4.9.sh and didn't change much on that script.

I have no explanation and can't anyway judge what is in the log. I will retest and post the logs again.

stefanhh0 commented 4 years ago

Now I have an explanation: https://github.com/MarijnS95/vendor-sony-oss-fingerprint/blob/9cf2f139bf2b5febda523fe384ccd7a61970580b/Android.mk#L24

It has been added with commit: https://github.com/sonyxperiadev/vendor-sony-oss-fingerprint/commit/9cf2f139bf2b5febda523fe384ccd7a61970580b which was integrated after the unstable build from 03.12.

That also means that the #ifdef USE_FPC_YOSHINO branch wasn't active for quite some time when doing yoshino builds...

MarijnS95 commented 4 years ago

Awesome! This code has never even been used... What do you know :)

MarijnS95 commented 4 years ago

Don't know how I didn't realise this when you originally posted the bug, I changed that yesterday...

stefanhh0 commented 4 years ago

No big deal, happens ;-)

Question is then, should that code path be activated now? I will test it with #442 in mind to see if that improves the stability when fingerprint authentication is used.

Afterwards I could retest with activated fingerprint gestures again.

stefanhh0 commented 4 years ago

I don't seem to manage triggering the else branch "Reinitialize the TZ app and parameters" any idea what I need to do?

Have done following:

    while((status = fpc_capture_image(mDevice->fpc)) >= 0 ) {
        ALOGE("DEBUG: DEBUG WHILE!!");
[...]
            } else {
                /*
                 * Reinitialize the TZ app and parameters
                 * to clear the TZ error generated by flooding it
                 */
                ALOGE("DEBUG: REINITIALIZING TZAPP!!");
                result = fpc_close(&mDevice->fpc);
                LOG_ALWAYS_FATAL_IF(result < 0, "REINITIALIZE: Failed to close fpc: %d", result);
                sleep(1);
                result = fpc_init(&mDevice->fpc, mWt.getEventFd());
                LOG_ALWAYS_FATAL_IF(result < 0, "REINITIALIZE: Failed to init fpc: %d", result);
#ifdef USE_FPC_YOSHINO
                int grp_err = __setActiveGroup(gid);
                if (grp_err)
                    ALOGE("%s : Cannot reinitialize database", __func__);
                ALOGE("DEBUG: FINISHED REINITIALIZING TZAPP!!");
#else
                // Break out of the loop, and make sure ERROR_HW_UNAVAILABLE
                // is raised afterwards, similar to the stock hal:
                status = -1;
                break;
#endif
            }

The "DEBUG: DEBUG WHILE" comes bot not the other log debugs...

MarijnS95 commented 4 years ago

@stefanhh0 Have you tried the debug logs on the FPC HAL checked out before my fingerprint PR (from the looks of it, this piece of code comes from the gesture one)? If the issue on Tama is the same or at least similar to what crashed Yoshino, it should be gone now.

https://github.com/sonyxperiadev/vendor-sony-oss-fingerprint/commit/2e9cc28cc0175aa2a4abd35bff404655a2d595b0 Apparently the issue is caused by "TZ app gets too many authentication failures" - in case of Tama that is the second authentication after an invalid one, even if the current one is valid. Try a wrong finger repeatedly, that should provoke it.

stefanhh0 commented 4 years ago

I don't have any logs before your fingerprint PR.

I have done many invalid authentication attempts with a different finger and also having first a false finger and afterwards correct finger didn't trigger the log entries. Trying it many times with the wrong finger leads to a message saying that I tried too often unsuccessfully and that further attempts are ignored resp. that I have to wait (not exactly the words but in that sense).

So I can't trigger the else branch, don't know if the root-cause is gone though maybe I wasn't trying hard enough. But maybe it is safe now to say that for yoshino it is fixed as well.

For your reference a log that I've just recorded once more: logcat.log

MarijnS95 commented 4 years ago

I don't have any logs before your fingerprint PR.

If you have time, would it be possible to gather those logs? With the added log messages, outside of the ifdef?

If you get it to trigger there, especially after doing this rigorous testing on the current version, we can safely say this has been the fix. If not, this is still enough for me to declare it fixed... Means the code can be cleaned up a bit more.

stefanhh0 commented 4 years ago

I have tried to but failed to trigger that branch even without the PR. But maybe I have done something wrong?

What I did, rebasing to before the PR:

stefan@mars:~/android/source/vendor/oss/fingerprint$ git log --name-status 
commit 2fe7f7200929256d6cd73f1a8fbab02495b8c4a1 (HEAD)
Author: Stefan Rücker <s.ruecker@gmx.de>
Date:   Sun Dec 8 01:06:57 2019 +0100

    add debug output

M       BiometricsFingerprint.cpp

commit ab0d04393c95bb1436f61a65b5a8f9829bd5ad68
Merge: 22af248 5a6abf8
Author: Alin Jerpelea <alin.jerpelea@sony.com>
Date:   Tue Sep 24 07:43:26 2019 +0200

    Merge pull request #58 from krnlyng/authenticate_5

    Egistec: Ganges: don't fail to authenticate if 5 fingerprints are enrolled.

commit 5a6abf86e272c962a50da5b9f93a8bf45d3f79a4
Author: Franz-Josef Haider <f_haider@gmx.at>
Date:   Mon Sep 23 13:22:41 2019 +0000

    Egistec: Ganges: don't fail to authenticate if 5 fingerprints are enrolled.

M       egistec/ganges/BiometricsFingerprint.cpp

Touched then all files in that source-dir and subdirs to force rebuild. Rebuild and flashed it.

Diff output of my commit:

stefan@mars:~/android/source/vendor/oss/fingerprint$ git diff HEAD^1 HEAD
diff --git a/BiometricsFingerprint.cpp b/BiometricsFingerprint.cpp
index 73bd53c..153892a 100644
--- a/BiometricsFingerprint.cpp
+++ b/BiometricsFingerprint.cpp
@@ -536,6 +536,7 @@ void BiometricsFingerprint::process_auth(sony_fingerprint_device_t *sdev) {
     fpc_auth_start(sdev->fpc);

     while((status = fpc_capture_image(sdev->fpc)) >= 0 ) {
+        ALOGE("DEBUG: DEBUG WHILE!!");
         ALOGV("%s : Got Input with status %d", __func__, status);

         if (isCanceled(sdev)) {
@@ -606,16 +607,19 @@ void BiometricsFingerprint::process_auth(sony_fingerprint_device_t *sdev) {
                  * Reinitialize the TZ app and parameters
                  * to clear the TZ error generated by flooding it
                  */
+                ALOGE("DEBUG: REINITIALIZING TZAPP!!");
                 fpc_close(&sdev->fpc);
                 fpc_init(&sdev->fpc, sdev->worker.event_fd);
 #ifdef USE_FPC_YOSHINO
                 int grp_err = __setActiveGroup(sdev, gid);
                 if (grp_err)
                     ALOGE("%s : Cannot reinitialize database", __func__);
+                ALOGE("DEBUG: FINISHED REINITIALIZING TZAPP!!");
 #else
                 // Break out of the loop, and make sure ERROR_HW_UNAVAILABLE
                 // is raised afterwards, similar to the stock hal:
                 status = -1;
+                ALOGE("DEBUG: FINISHED REINITIALIZING TZAPP!!");
                 break;
 #endif
             }

Logcat from that session: logcat.log

MarijnS95 commented 4 years ago

The issue could have been fixed "accidentally" over the past year since there have been a ton of changes and improvements.

We have not deprecated any of the unused ioctl interfaces in the kernel yet, so the old HAL should function fine. If you have time and want to continue debugging this - especially to get to the bottom of the device crashing - would you mind doing a git bisect run on the HAL? Start with the commit that originally addressed the crash: https://github.com/sonyxperiadev/vendor-sony-oss-fingerprint/commit/2e9cc28cc0175aa2a4abd35bff404655a2d595b0 Be sure to add those logging lines, and try to replicate the crash-on-resume as well as that alternate codepath.

I find the crash on resume curious anyway. You seem to be the only one reporting it (unless there's another issue sitting around somewhere? Afaik only for Tone) but there must be other users out there using FPC. Is there a possibility your device firmware is lagging behind? I don't know the versions nor whether the TZ-app for FPC resides on that firmware partition or odm.

Anyway, for now I'll assume that FPC at the commit from Angelo is "fine". If your device still crashes with that one, we must look elsewhere.

stefanhh0 commented 4 years ago

We have not deprecated any of the unused ioctl interfaces in the kernel yet, so the old HAL should function fine. If you have time and want to continue debugging this - especially to get to the bottom of the device crashing - would you mind doing a git bisect run on the HAL? Start with the commit that originally addressed the crash: sonyxperiadev/vendor-sony-oss-fingerprint@2e9cc28 Be sure to add those logging lines, and try to replicate the crash-on-resume as well as that alternate codepath.

That will take some time... not sure if I manage to do it the coming week.

I find the crash on resume curious anyway. You seem to be the only one reporting it (unless there's another issue sitting around somewhere? Afaik only for Tone) but there must be other users out there using FPC. Is there a possibility your device firmware is lagging behind? I don't know the versions nor whether the TZ-app for FPC resides on that firmware partition or odm.

E.g. @wyzco had those issues as well. Yes, we are on an older firmware using version baseband: 1308-8921_47.1.A.16.20 that is the version recommended by @jerpelea. I have also tried already: 47.2.A.11.228 which is the latest version, but that one introduces an energy consumption problem, in the sense that the battery is emptied really fast. I don't know anyway if past 47.1.A.16.20 something firmware related to the FPC touch sensor has been updated. Well I am not even sure if the problem may have been introduced with a changed software binaries version (currently I am using: 9.0_2.3.2_v9). @jerpelea is the software binaries version relevant, the version baseband or even both are relevant for the FPC touch sensor?

Anyway, for now I'll assume that FPC at the commit from Angelo is "fine". If your device still crashes with that one, we must look elsewhere.

I will check that one first and verify if on the commit https://github.com/sonyxperiadev/vendor-sony-oss-fingerprint/commit/2e9cc28cc0175aa2a4abd35bff404655a2d595b0 the FPC works without a crash when resuming and that I get the else branch triggered.

stefanhh0 commented 4 years ago

Switching back to https://github.com/sonyxperiadev/vendor-sony-oss-fingerprint/commit/2e9cc28cc0175aa2a4abd35bff404655a2d595b0 and adding the log output.

Rebuilding, then flashing vendor.img (only vendor not the others)

I then can't register a fingerprint, it remains asking to touch the sensor, however when I touch it nothing happens. Find attached the logs: dmesg.log logcat.log

Any suggestion why or can you identify another first commit with that it should work?

Or is there an easy "fix" available for that commit?

MarijnS95 commented 4 years ago

That will take some time... not sure if I manage to do it the coming week.

No hurry, I don't have much if any time left over either. I'm asking you (or someone else) because I don't own a Yoshino, and can't test (nor feel the pain of not having a working fingerprint sensor...). Though, if it is a bug introduced by my adaptations for Tama, it still is my fault.

Don't bother with the baseband/firmware anymore. I checked the Yoshino odm, and FPC firmware is in there. If the bug is reproducible on older commits, the change is likely in there (or kernel, which I doubt).

Rebuilding, then flashing vendor.img (only vendor not the others)

I then can't register a fingerprint, it remains asking to touch the sensor, however when I touch it nothing happens. Find attached the logs:

Nothing seems to be happening. It is only generating a preEnroll challenge, and Android is not even continuing with the enroll. The framework is most likely still waiting on a result from the enumerate() call, which never signaled that there are 0 fingers registered. I only fixed that a bit later, when I joined this project: https://github.com/sonyxperiadev/vendor-sony-oss-fingerprint/commit/4c94430 You can try cherry-picking that one and another improvement directly after, from the same PR: https://github.com/sonyxperiadev/vendor-sony-oss-fingerprint/commit/0417464 Though it is probably easier and safer to just checkout after that has been merged, which includes my initial round of cleanups and improved warnings: https://github.com/sonyxperiadev/vendor-sony-oss-fingerprint/commit/22fead3

After that your steps depend on what happens:

stefanhh0 commented 4 years ago

With odm you are referring to the software binaries (e.g. SW_binaries_for_Xperia_Android_9.0_2.3.2_v9_yoshino.img)? I don't have any odm.img in the out directory...

I was able to register a fingerprint with checking out https://github.com/sonyxperiadev/vendor-sony-oss-fingerprint/commit/22fead3 however then everything is crashing a lot not only on resume, so I can only assume it is a different crash...

At least I could trigger the else branch, but that one is the else branch that also includes recoverable errors, as the other branch for recoverable errors was introduced later... logcat.log

MarijnS95 commented 4 years ago

With odm you are referring to the software binaries (e.g. SW_binaries_for_Xperia_Android_9.0_2.3.2_v9_yoshino.img)? I don't have any odm.img in the out directory...

Yes, the software binaries.

I was able to register a fingerprint with checking out sonyxperiadev/vendor-sony-oss-fingerprint@22fead3 however then everything is crashing a lot not only on resume, so I can only assume it is a different crash...

Crashing a lot? The device? The service has never died and restarted, if the log is to be believed.

At least I could trigger the else branch, but that one is the else branch that also includes recoverable errors, as the other branch for recoverable errors was introduced later... logcat.log

12-08 15:46:37.847   809  1053 D FPC IMP : Print identified as 0
12-08 15:46:37.848   809  1053 I AOSP FPC HAL (Binder): process_auth : Auth step = -11
12-08 15:46:37.848   809  1053 E AOSP FPC HAL (Binder): DEBUG: REINITIALIZING TZAPP!!

I guess you mean that one? That is indeed -EAGAIN, for which a separate codepath was introduced in https://github.com/sonyxperiadev/vendor-sony-oss-fingerprint/commit/fc10243279b3ff60f0e610216af906b66668e08d That is unfortunately not the case we are looking for.

stefanhh0 commented 4 years ago

I was able to register a fingerprint with checking out sonyxperiadev/vendor-sony-oss-fingerprint@22fead3 however then everything is crashing a lot not only on resume, so I can only assume it is a different crash...

Crashing a lot? The device? The service has never died and restarted, if the log is to be believed.

The device crashes, it reboots easily. I can't do exhaustive tests on the FPC, because the phone reboots after some attempts and I was only able to grab the logcat once when I did not try to flood the FPC touch sensor but had only a single or very few failures... and okay then recoverable error which is -11.

Trying the cherry-pick approach:

stefan@mars:~/android/source/vendor/oss/fingerprint$ git reset --hard 2e9cc28cc0175aa2a4abd35bff404655a2d595b0
HEAD ist jetzt bei 2e9cc28 Properly reinitialize the TZ app in case of auth flood
stefan@mars:~/android/source/vendor/oss/fingerprint$ git cherry-pick  4c94430051655dace418c25a5bc9c05c2f13b983
automatischer Merge von BiometricsFingerprint.cpp
[losgelöster HEAD 020b44d] Signal onEnumerate end when no fingerprints are available.
 Author: MarijnS95 <marijns95@gmail.com>
 Date: Thu Nov 22 20:25:36 2018 +0100
 1 file changed, 12 insertions(+), 8 deletions(-)
stefan@mars:~/android/source/vendor/oss/fingerprint$ git cherry-pick -Xtheirs 0417464711f15272a57d1f287547ffaa62bd60bb
automatischer Merge von fpc_imp_yoshino_nile.c
automatischer Merge von fpc_imp_loire_tone.c
automatischer Merge von BiometricsFingerprint.cpp
[losgelöster HEAD dbf6ab9] get_print_index: Set initial length to MAX, return actual length to caller.
 Author: MarijnS95 <marijns95@gmail.com>
 Date: Sun Nov 25 19:49:40 2018 +0100
 4 files changed, 25 insertions(+), 44 deletions(-)
stefan@mars:~/android/source/vendor/oss/fingerprint$ find . -name "*.[ch]*" -exec touch "{}" \;

Has basically the same problem. Device crashes a lot, any idea if there is a better commit? Since both attempts lead to an easily crashing system they are no good basis for doing a bisect search...

If we don't find a good basis commit this form of debugging won't help but only someone with a yoshino and sufficient programming skills and insights into the project is able to find the root cause.

Regarding fingerprint sensor, I am not in urgent need for that neither would just have been nice if we would get it working. Personally I don't think biometrical security is in anyway a good approach. It can be compromised but you can't change your biometical properties easily like changing a passowr or PIN. Additionally biometrical sensor have a lot more attack vectors and due to the complexity tend to grant false positive authentications.

MarijnS95 commented 4 years ago

That's really odd and unexpected at those commits. Which leads me to believe that it either had to be extremely unstable one year ago, or there has been a significant change in the kernel or TZ-app.

If possible, would you mind going the other route and try the tzfingerprint TZ-app from an older sofware-binaries release (preferably those from a year ago)? You can try this on both master and the older commits (where it doesn't matter if you cherry-pick or checkout on the commit with cleanups and fixes).

I personally only use the fingerprint to unlock the device, for convenience. The actual data used to match fingers is never supposed to leave the secure processor in your device unencrypted... never supposed to...

stefanhh0 commented 4 years ago

Well now it is a little more stable with the cherry-pick:

00d040f (HEAD) get_print_index: Set initial length to MAX, return actual length to caller.
d9376a8 Signal onEnumerate end when no fingerprints are available.
2e9cc28 Properly reinitialize the TZ app in case of auth flood

The code looks as follows:

        while((status = fpc_capture_image(sdev->fpc)) >= 0 ) {
            ALOGI("DEBUG: DEBUG WHILE!!");
            ALOGV("%s : Got Input with status %d", __func__, status);

            if (getState(sdev) != STATE_AUTH ) {
                break;
            }

BiometricsFingerprint.cpp

So the break leaves early and the following code is never triggered logcat.log:

12-08 16:46:27.218  1125  1125 W FingerprintService: client com.android.systemui is authenticating...
12-08 16:46:28.432   759  1041 D FPC IMP : Finger lost as expected
12-08 16:46:28.952   759  1041 D FPC IMP : fpc_wait_finger_down = 0xFFFFFFFF
12-08 16:46:28.954   759  1041 I AOSP FPC HAL (Binder): DEBUG: DEBUG WHILE!!
12-08 16:46:28.955   759  1041 D AOSP FPC HAL (Binder): getState
12-08 16:46:28.969   759  1041 D FPC IMP : fpc_wait_finger_lost = 0x00000000
12-08 16:46:28.969   759  1041 D FPC IMP : Finger lost as expected
12-08 16:46:29.480   759  1041 D FPC IMP : fpc_wait_finger_down = 0xFFFFFFFF
12-08 16:46:29.482   759  1041 I AOSP FPC HAL (Binder): DEBUG: DEBUG WHILE!!
12-08 16:46:29.483   759  1041 D AOSP FPC HAL (Binder): getState

Resuming worked not so nice even then... so yes I am trying using the tz binaries from a different software binaries. It should go then here:

lilac:/odm/firmware # ls -la tzfi
tzfingerprint.b00  tzfingerprint.b01  tzfingerprint.b02  tzfingerprint.b03  tzfingerprint.b04  tzfingerprint.b05  tzfingerprint.b06  tzfingerprint.b07  tzfingerprint.mdt
lilac:/odm/firmware # ls -la tzfingerprint*                                                                                                                                                                                                                                    
-rw-r--r-- 1 root root    308 2019-05-21 11:02 tzfingerprint.b00
-rw-r--r-- 1 root root   6696 2019-05-21 11:02 tzfingerprint.b01
-rw-r--r-- 1 root root 438482 2019-05-21 11:02 tzfingerprint.b02
-rw-r--r-- 1 root root  45856 2019-05-21 11:02 tzfingerprint.b03
-rw-r--r-- 1 root root    332 2019-05-21 11:02 tzfingerprint.b04
-rw-r--r-- 1 root root    136 2019-05-21 11:02 tzfingerprint.b05
-rw-r--r-- 1 root root    136 2019-05-21 11:02 tzfingerprint.b06
-rw-r--r-- 1 root root   6504 2019-05-21 11:02 tzfingerprint.b07
-rw-r--r-- 1 root root   7004 2019-05-21 11:02 tzfingerprint.mdt

I will remount it with: mount -o rw,remount /odm and will push the fingerprint files from SW_binaries_for_Xperia_Android_9.0.6.3_r1_v1_yoshino_beta.img that is an image I've jsut downloaded dating back to 19.10.2018

stefanhh0 commented 4 years ago

Uploaded the old tzfingerprint files and rebooted:

lilac:/odm/firmware # ls -la tzfingerprint.*                                                                                                                                                                                                                                 
-rw-r--r-- 1 root root    308 2018-09-11 13:12 tzfingerprint.b00
-rw-r--r-- 1 root root   6696 2018-09-11 13:12 tzfingerprint.b01
-rw-r--r-- 1 root root 438482 2018-09-11 13:12 tzfingerprint.b02
-rw-r--r-- 1 root root  45856 2018-09-11 13:12 tzfingerprint.b03
-rw-r--r-- 1 root root    332 2018-09-11 13:12 tzfingerprint.b04
-rw-r--r-- 1 root root    136 2018-09-11 13:12 tzfingerprint.b05
-rw-r--r-- 1 root root    136 2018-09-11 13:12 tzfingerprint.b06
-rw-r--r-- 1 root root   6504 2018-09-11 13:12 tzfingerprint.b07
-rw-r--r-- 1 root root   7004 2018-09-11 13:12 tzfingerprint.mdt

Btw. could you or someone else check if there is a newer firmware available for that hardware maybe it is something that has been fixed already upstream...

Another thing, do you know how long to wait to resume the device from the different sleep states, i.e. how long I have to wait. Or do you know the place where to look at?

stefanhh0 commented 4 years ago

With the old firmware it is no longer crashing on the cherry-picked commit. On the latest commit with old firmware it crashes when resuming. logcat.log

Checking now if https://github.com/sonyxperiadev/vendor-sony-oss-fingerprint/commit/22fead3 is a good commit with the old firmware which can be used as a basis for the bisect search.

stefanhh0 commented 4 years ago

With https://github.com/sonyxperiadev/vendor-sony-oss-fingerprint/commit/22fead3 I fail to trigger another verify_state code than -11. The commit that introduced the change was on 2018-10-03, since the firmware I have used was from 2018-10-19 it makes sense to use the firmware from 2018-08-09 instead (the firmware that was shipped previous when the commit has been done initially) and retry to trigger the else branch with a different failure state with that FW.

With https://github.com/sonyxperiadev/vendor-sony-oss-fingerprint/commit/22fead3 I was able to trigger the resume bug. I will recheck with the cherry-picked version. For now it seems to be that the resume bug has been introduced between: base-comit: https://github.com/sonyxperiadev/vendor-sony-oss-fingerprint/commit/2e9cc28cc0175aa2a4abd35bff404655a2d595b0 + cherry-pick: https://github.com/sonyxperiadev/vendor-sony-oss-fingerprint/commit/4c94430 + cherry-pick: https://github.com/sonyxperiadev/vendor-sony-oss-fingerprint/commit/0417464 -> base-comit: https://github.com/sonyxperiadev/vendor-sony-oss-fingerprint/commit/22fead3 But that needs to be verified still.

logcat.log

stefanhh0 commented 4 years ago

Commit: https://github.com/sonyxperiadev/vendor-sony-oss-fingerprint/commit/22fead3 tzfingerprint fw: 2018-08-09

Couldn't trigger any other verify_state other than -11 and 0. logcat.log

Without being able to provoke any other verify_state than -11 or 0 it is not possible to search for a commit that changed the behavior regarding verify_state. So maybe it is a more or less rare race condition, I am doing something wrong or another change made it disappear (as e.g. a kernel change or timing behavior change). Unless you have a good idea I stop investigating how to trigger the else branch with a non recoverable error.

The resume bug was also triggerable with that firmware but only once. I will continue investigating that one.

MarijnS95 commented 4 years ago

That's a lot of solid info, and I'm especially glad to know there's a difference between the various odm libraries. Though that means someone (likely me) needs to go in and find the changes. The first step is going to be using Tama datastructures and code on Yoshino, as it's fairly likely Sony unified their HAL just like they did with a firmware update on Nile - where the Egistec TZ now has the exact same interface as the one found on Ganges and Kumano.

I'll try to condense the info tomorrow and see if I can find the issue within that range. I was hoping it to be rock-solid but apparently that's not the case.

Can you confirm whether the tzfingerprint firmware is the same on 2018-08-09 and 2018-10-19? A binary diff will do, these blobs should never change (signing process etc...) unless a code change was necessary.

Anyway, it is maybe not a good idea to waste too much time on older commits, when the solution might be very easy. Can you try the following on top of master, with the latest tzapp?

Diff ```diff diff --git a/Android.mk b/Android.mk index 7d57588..c9d3407 100644 --- a/Android.mk +++ b/Android.mk @@ -18,11 +18,11 @@ LOCAL_SRC_FILES += fpc_imp_loire_tone.c HAS_FPC := true endif -ifeq ($(filter-out yoshino,$(SOMC_PLATFORM)),) -LOCAL_SRC_FILES += fpc_imp_yoshino_nile_tama.c -HAS_FPC := true -LOCAL_CFLAGS += -DUSE_FPC_YOSHINO -endif +# ifeq ($(filter-out yoshino,$(SOMC_PLATFORM)),) +# LOCAL_SRC_FILES += fpc_imp_yoshino_nile_tama.c +# HAS_FPC := true +# LOCAL_CFLAGS += -DUSE_FPC_YOSHINO +# endif ifeq ($(filter-out nile,$(SOMC_PLATFORM)),) LOCAL_SRC_FILES += fpc_imp_yoshino_nile_tama.c @@ -30,7 +30,7 @@ HAS_FPC := true LOCAL_CFLAGS += -DUSE_FPC_NILE endif -ifeq ($(filter-out tama,$(SOMC_PLATFORM)),) +ifneq ($(filter yoshino tama,$(SOMC_PLATFORM)),) LOCAL_SRC_FILES += fpc_imp_yoshino_nile_tama.c HAS_FPC := true LOCAL_CFLAGS += -DUSE_FPC_TAMA ```
stefanhh0 commented 4 years ago

That was a really good idea to diff the firmware binaries, since they are always the same :-) The difference in stability regarding crashing the phone was then only by chance. Not surprising that with the cherry picked commit combination + the firmware from the older odm.img the phone eventually rebooted as well due to a crash or hang hang condition.

Here are the comparisons:

2018-08-09: 8.1.6.3_r1_v15_yoshino
2018-10-19: 9.0.6.3_r1_v1_yoshino
2019-05-24: 9.0_2.3.2_v9_yoshino
2019-11-29: 10.0.7.1_r1_v2b_yoshino

stefan@mars:~/android/firmware/tzfingerprint/2018-08-09$ set -x;for f in `ls -1`; do cmp $f ../2018-10-19/$f; done;set +x
++ ls --color=auto -1
+ for f in `ls -1`
+ cmp tzfingerprint.b00 ../2018-10-19/tzfingerprint.b00
+ for f in `ls -1`
+ cmp tzfingerprint.b01 ../2018-10-19/tzfingerprint.b01
+ for f in `ls -1`
+ cmp tzfingerprint.b02 ../2018-10-19/tzfingerprint.b02
+ for f in `ls -1`
+ cmp tzfingerprint.b03 ../2018-10-19/tzfingerprint.b03
+ for f in `ls -1`
+ cmp tzfingerprint.b04 ../2018-10-19/tzfingerprint.b04
+ for f in `ls -1`
+ cmp tzfingerprint.b05 ../2018-10-19/tzfingerprint.b05
+ for f in `ls -1`
+ cmp tzfingerprint.b06 ../2018-10-19/tzfingerprint.b06
+ for f in `ls -1`
+ cmp tzfingerprint.b07 ../2018-10-19/tzfingerprint.b07
+ for f in `ls -1`
+ cmp tzfingerprint.mdt ../2018-10-19/tzfingerprint.mdt
+ set +x
stefan@mars:~/android/firmware/tzfingerprint/2018-10-19$ set -x;for f in `ls -1`; do cmp $f ../2019-05-24/$f; done;set +x
++ ls --color=auto -1
+ for f in `ls -1`
+ cmp tzfingerprint.b00 ../2019-05-24/tzfingerprint.b00
+ for f in `ls -1`
+ cmp tzfingerprint.b01 ../2019-05-24/tzfingerprint.b01
+ for f in `ls -1`
+ cmp tzfingerprint.b02 ../2019-05-24/tzfingerprint.b02
+ for f in `ls -1`
+ cmp tzfingerprint.b03 ../2019-05-24/tzfingerprint.b03
+ for f in `ls -1`
+ cmp tzfingerprint.b04 ../2019-05-24/tzfingerprint.b04
+ for f in `ls -1`
+ cmp tzfingerprint.b05 ../2019-05-24/tzfingerprint.b05
+ for f in `ls -1`
+ cmp tzfingerprint.b06 ../2019-05-24/tzfingerprint.b06
+ for f in `ls -1`
+ cmp tzfingerprint.b07 ../2019-05-24/tzfingerprint.b07
+ for f in `ls -1`
+ cmp tzfingerprint.mdt ../2019-05-24/tzfingerprint.mdt
+ set +x
stefan@mars:~/android/firmware/tzfingerprint/2019-05-24$ set -x;for f in `ls -1`; do cmp $f ../2019-11-19/$f; done;set +x
++ ls --color=auto -1
+ for f in `ls -1`
+ cmp tzfingerprint.b00 ../2019-11-19/tzfingerprint.b00
+ for f in `ls -1`
+ cmp tzfingerprint.b01 ../2019-11-19/tzfingerprint.b01
+ for f in `ls -1`
+ cmp tzfingerprint.b02 ../2019-11-19/tzfingerprint.b02
+ for f in `ls -1`
+ cmp tzfingerprint.b03 ../2019-11-19/tzfingerprint.b03
+ for f in `ls -1`
+ cmp tzfingerprint.b04 ../2019-11-19/tzfingerprint.b04
+ for f in `ls -1`
+ cmp tzfingerprint.b05 ../2019-11-19/tzfingerprint.b05
+ for f in `ls -1`
+ cmp tzfingerprint.b06 ../2019-11-19/tzfingerprint.b06
+ for f in `ls -1`
+ cmp tzfingerprint.b07 ../2019-11-19/tzfingerprint.b07
+ for f in `ls -1`
+ cmp tzfingerprint.mdt ../2019-11-19/tzfingerprint.mdt
+ set +x

I'll give it a try with the changed make file.

stefanhh0 commented 4 years ago

After downloading also all earlier SW binaries for yoshino from https://developer.sony.com/develop/open-devices/downloads/software-binaries I found that the tzfingerprint binaries have been changed only once. Downloaded firmwares:

stefan@mars:~/android/firmware/tzfingerprint$ ls -1
2018-01-12_8.0.5.7_r1_v10
2018-01-19_N_MR1_5.7_r1_v08
2018-08-09_8.1.6.3_r1_v15
2018-10-19_9.0.6.3_r1_v1
2019-05-24_9.0_2.3.2_v9
2019-11-19_10.0.7.1_r1_v2b

Nougat (2018-01-19_N_MR1_5.7_r1_v08) and 8.0 (2018-01-12_8.0.5.7_r1_v10) are identical.

Diff was found when comparing the following:

stefan@mars:~/android/firmware/tzfingerprint/2018-01-12_8.0.5.7_r1_v10$ for f in `ls`; do cmp $f ../2018-08-09_8.1.6.3_r1_v15/$f; done
tzfingerprint.b00 ../2018-08-09_8.1.6.3_r1_v15/tzfingerprint.b00 sind verschieden: Byte 94, Zeile 1
tzfingerprint.b01 ../2018-08-09_8.1.6.3_r1_v15/tzfingerprint.b01 sind verschieden: Byte 41, Zeile 1
tzfingerprint.b02 ../2018-08-09_8.1.6.3_r1_v15/tzfingerprint.b02 sind verschieden: Byte 29, Zeile 1
tzfingerprint.b03 ../2018-08-09_8.1.6.3_r1_v15/tzfingerprint.b03 sind verschieden: Byte 382, Zeile 1
tzfingerprint.b04 ../2018-08-09_8.1.6.3_r1_v15/tzfingerprint.b04 sind verschieden: Byte 2, Zeile 1
tzfingerprint.b05 ../2018-08-09_8.1.6.3_r1_v15/tzfingerprint.b05 sind verschieden: Byte 53, Zeile 2
tzfingerprint.b06 ../2018-08-09_8.1.6.3_r1_v15/tzfingerprint.b06 sind verschieden: Byte 53, Zeile 2
tzfingerprint.b07 ../2018-08-09_8.1.6.3_r1_v15/tzfingerprint.b07 sind verschieden: Byte 9, Zeile 1
tzfingerprint.mdt ../2018-08-09_8.1.6.3_r1_v15/tzfingerprint.mdt sind verschieden: Byte 94, Zeile 1

So maybe unification of the hals could date back to the 8.1 SW binaries release?! I could give it a try with the Pre 8.1 tzfingerprint firmware.

stefanhh0 commented 4 years ago

The change of the Android.mk did not do the trick. The option to register a fingerprint is then not available and related crashes can be found in the logcat.log.

Maybe because of the different FP_TZAPP_NAME definitions? yoshino: #define FP_TZAPP_NAME "tzfingerprint" tama: #define FP_TZAPP_NAME "fpctzfingerprint"

From one of the tombstones:

12-09 06:52:18.808  8211  8211 I FPC IMP : INIT FPC TZ APP
12-09 06:52:18.808  8211  8211 D QSEE_WRAPPER: Using Target Lib : libQSEEComAPI.so
12-09 06:52:18.809  8211  8211 D QSEE_WRAPPER: Loaded QSEECom API library at 0xb808f8218b17f4e5
12-09 06:52:18.810  8211  8211 D QSEE_WRAPPER: Loaded QSEECom API library...
12-09 06:52:18.818  8211  8211 I FPC UInput: Successfully created uinput device! rc=0
12-09 06:52:18.822  8211  8211 I FPC IMP : Starting app keymaster64
12-09 06:52:18.822  8211  8211 I QSEE_WRAPPER: Starting app keymaster64
12-09 06:52:18.823  8211  8211 D QSEECOMAPI: QSEECom_get_handle sb_length = 0x400
12-09 06:52:18.824  8211  8211 D QSEECOMAPI: App is already loaded QSEE and app id = 65537
12-09 06:52:18.826  8211  8211 I QSEE_WRAPPER: TZ App loaded: keymaster64
12-09 06:52:18.826  8211  8211 I FPC IMP : Starting app fpctzfingerprint
12-09 06:52:18.829  8211  8211 D QSEE_WRAPPER: Warning: sb_size too small, increasing to avoid breakage
12-09 06:52:18.830  8211  8211 I QSEE_WRAPPER: Starting app fpctzfingerprint
12-09 06:52:18.830  8211  8211 D QSEECOMAPI: QSEECom_get_handle sb_length = 0x400
12-09 06:52:18.860  8211  8211 D QSEECOMAPI: App is not loaded in QSEE
12-09 06:52:18.860  8211  8211 E QSEECOMAPI: Error::Cannot open the file /odm/firmware//fpctzfingerprint.mdt errno = 2
12-09 06:52:18.861  8211  8211 E QSEECOMAPI: Error::Loading image failed with ret = -1
12-09 06:52:18.861  8211  8211 E QSEE_WRAPPER: Could not load app fpctzfingerprint. Error: QSEECom: Failed to register listener
12-09 06:52:18.861  8211  8211 E QSEE_WRAPPER:  (-1)
12-09 06:52:18.861  8211  8211 E FPC IMP : Could not load app : fpctzfingerprint
12-09 06:52:18.862  8211  8211 D QSEECOMAPI: QSEECom_dealloc_memory 
12-09 06:52:18.862  8211  8211 D QSEECOMAPI: QSEECom_shutdown_app, app_id = 65537
12-09 06:52:18.865  8211  8211 F FPC     : Failed to close ION device

Logfiles: dmesg.log logcat.log tombstones.zip

stefanhh0 commented 4 years ago

Using the Pre-8.1 fw is as well not working together with latest master. Logfiles: dmesg.log logcat.log tombstones.zip

stefanhh0 commented 4 years ago

Renaming the TZAPP to fpctzfingerprint introduces instability. Here a logfile: logcat.log The option to register a fingerprint was available, but the app did not react on trying to register a finger. Shortly after I've tried, the phone rebooted. On another attempt I have seen the Option to register when I tried to access it, the phone rebooted.

MarijnS95 commented 4 years ago

Nougat (2018-01-19_N_MR1_5.7_r1_v08) and 8.0 (2018-01-12_8.0.5.7_r1_v10) are identical.

Using the Pre-8.1 fw is as well not working together with latest master.

Looks like we can safely say that the current FW (identical to the end of 2018) is the one this has been developed/used with. Don't bother with Tama datastructures anymore - if the blobs are the same the details are not going to be in there.

Is your phone encrypted? That should utilize qseecom etc a good bit (keymaster etc), making an instability there unlikely.

With that in mind, I am still confident the issue is in our fingerprint HAL, especially since I can provoke a similar crash easily on other platforms. It is either a panic state or a safety feature, where the entire board/soc is simply rebooted when the trusted processor runs into an error or encounters data it doesn't trust/like.

One example of such a crash is the initial gesture implementation on Loire. I don't remember exactly if I was sending commands in the wrong order or the wrong datastructure/size, but the device would crash after a minute. Carefully cleaning up the codepath resolved the issue. For that reason I'm glad that this crash also happens in the Gesture codepath on Yoshino. That makes it less code to investigate, though I don't have any existing implementation to compare this with...

One more thing we can try is the stock HAL. This means I need to reinstantiate the original sysfs or whatever interface in the kernel, to be able to run the following tests:

stefanhh0 commented 4 years ago

Yes it is encrypted. Sounds good, if you have something to test for me, just let me know... I won't try anything else until you say so.