meta-flutter / meta-flutter

Google Flutter for Yocto
MIT License
137 stars 64 forks source link

flutter-auto long term stability #320

Open jwinarske opened 1 year ago

jwinarske commented 1 year ago
         Just an update from our side:

We were able to revert back to Flutter version 3.3.7 and get flutter-auto working with our application. We have been testing the stability of running our application with flutter-auto over the last several weeks. Unfortunately we are seeing issues that the application sometimes stops updating the screen (animation on home screen stops, no screen updates on user input). The application is still running and still communicates with our back-end so it seems that only the screen updating is broken. There are no crash reports or debug messages printed when this happens so we are unsure if the issue is caused by flutter-auto or our application. Restarting the application resolves the issue for a while, but after a few days the same thing happens again (screen freezes/stops updating). In parallel we have also some devices running our application with the flutter-pi embedder and on these devices we have never seen this problem. Some of these devices have ben running for > 2 weeks without problems.

Are there any known stability issues with flutter-auto that could explain this behavior? We typically only see this happening after the application has been running for a few days.

Originally posted by @X-Terminator in https://github.com/meta-flutter/meta-flutter/issues/295#issuecomment-1697179264

jwinarske commented 1 year ago

@X-Terminator Thanks for the update.

For our cases we run x64 containers, x64 desktop (fedora/ubuntu), and aarch64 targets (yocto dunfell/kirkstone). No sign of a problem similar to your report.

There are many variables in a problem like this. In order to narrow this down, the expected order to deal with this is to come up with a minimum viable repro. The smallest possible scenario that replicates the problem.

Note that I am prepping the OSS release for ivi-homescreen -> flutter-auto. I should have it completed by end of day Wednesday (US).

So between the two the recommendation is update to latest release when it comes out, then attempt to repro your stability problem.

Also the upstream authority for ivi-homescreen/flutter-auto is https://github.com/toyota-connected/ivi-homescreen. So if issue is related to flutter-auto only, then it's best to raise an issue there.

jwinarske commented 1 year ago

@X-Terminator Another data point. flutter-auto is ivi-homescreen with some changes specific to AGL; which won't get picked up by Toyota. Unless you need the AGL-Compositor you would just use ivi-homescreen recipe.

jwinarske commented 11 months ago

@X-Terminator I have a repro on a TI SK-TDA4VM (j721e) (Arago tisdk-image-base, libweston-10_10.0.2, libgcc1_11.3.0). Took about three days to hit. What hardware/BSP and wayland compositor, weston+gcc versions hit the issue?

(gdb) info threads
  Id   Target Id                                          Frame 
* 1    Thread 0xffffb9a42020 (LWP 2413) "homescreen"      0x0000ffffb957c96c in ?? () from /lib/libc.so.6
  2    Thread 0xffffb92ff0a0 (LWP 2414) "homescreen"      0x0000ffffb957c96c in ?? () from /lib/libc.so.6
  3    Thread 0xffffb57580a0 (LWP 2415) "homescreen"      0x0000ffffb957c96c in ?? () from /lib/libc.so.6
  4    Thread 0xffffae9ee0a0 (LWP 2416) "io.flutter.ui"   0x0000ffffb95e7dc4 in epoll_pwait () from /lib/libc.so.6
  5    Thread 0xffffae1de0a0 (LWP 2417) "homescreen"      0x0000ffffb95de1a0 in poll () from /lib/libc.so.6
  6    Thread 0xffffad9ce0a0 (LWP 2418) "io.flutter.io"   0x0000ffffb95e7dc4 in epoll_pwait () from /lib/libc.so.6
  7    Thread 0xffff9ffff0a0 (LWP 2419) "io.worker.1"     0x0000ffffb957c96c in ?? () from /lib/libc.so.6
  8    Thread 0xffff9f7ef0a0 (LWP 2420) "io.worker.2"     0x0000ffffb957c96c in ?? () from /lib/libc.so.6
  9    Thread 0xffffad1be0a0 (LWP 2421) "dart:io EventHa" 0x0000ffffb95e7dc4 in epoll_pwait () from /lib/libc.so.6
(gdb) bt 10
#0  0x0000ffffb957c96c in ?? () from /lib/libc.so.6
#1  0x0000ffffb957f698 in pthread_cond_wait () from /lib/libc.so.6
#2  0x0000ffffb9986c84 in wl_display_read_events () from /usr/lib/libwayland-client.so.0
#3  0x0000aaaac8aefea8 in ?? ()
#4  0x0000ffffb952b230 in ?? () from /lib/libc.so.6
#5  0x0000ffffb952b30c in __libc_start_main () from /lib/libc.so.6
#6  0x0000aaaac8af3670 in ?? ()

@mv0 Does this ring a bell?

jwinarske commented 11 months ago

In my case it's looking like a TI/Imagination GPU driver crash, as the GPU is being reported as being powered off. The screen has a frozen image; the display output block is still being clocked.

------[ RGX Info ]------
Device Node (Info): 0000000040e4e319 (0000000037090564)
    DevmemHistoryRecordStats - None
RGX BVNC: 22.104.208.318 (rogue)
RGX Device State: Active
RGX Power State: OFF
FW info: 23.1 @  6404501 (release) build options: 0x80000810
TRP: HW support - No
WGP: HW support - No
RGX FW State: OK (HWRState 0x00000001: HWR OK;)
RGX FW Power State: RGXFWIF_POW_OFF (APM enabled: 2227406 ok, 10314 denied, 13 non-idle, 4373450 retry, 0 other, 6611196 total. Latency: 100 ms)
RGX DVFS: 0 frequency changes. Current frequency: 749.971 MHz (sampled at 219991527322094 ns). FW frequency: 100.000 MHz.
RGX FW OS 0 - State: active; Freelists: Ok; Priority: 0; Isolation group: 0; MTS off;
Number of HWR: GP(0/0+0), 2D(0/0+0), TA(3/3+0), 3D(0/0+0), CDM(0/0+0), FALSE(0,0,0,0,0)
DM 0 (GP)
DM 1 (HWRflags 0x00000000: working;)
DM 2 (HWRflags 0x00000000: working;)
  Recovery 1: PID = 2413 / homescreen, frame = 68857, HWRTData = 0xC002A280, EventStatus = 0x00004400, Guilty Lockup
              CRTimer = 0x00000003729A, OSTimer = 54608.178283060, CyclesElapsed = 48265984
              PreResetTimeInCycles = 38912, HWResetTimeInCycles = 20480, FreelistReconTimeInCycles = 5344256, TotalRecoveryTimeInCycles = 5403648
  Recovery 2: PID = 2413 / homescreen, frame = 98258, HWRTData = 0xC002A180, EventStatus = 0x00000600, Innocent Lockup
              CRTimer = 0x00000000BFBF, OSTimer = 55344.178872365, CyclesElapsed = -9140480
              PreResetTimeInCycles = 47872, HWResetTimeInCycles = 18944, FreelistReconTimeInCycles = 169472, TotalRecoveryTimeInCycles = 236288
    BIF0 - FAULT:
      * MMU status (0x0000000000001041): PC = 1, Page Size = 0 (Page Catalog).
      * Request (0x00008b0000000000): TA (PPP Context State), Writing to 0x0000000000.
    PC index (0) out of bounds (0)
  Recovery 3: PID = 2413 / homescreen, frame = 4595599, HWRTData = 0xC002E640, EventStatus = 0x00004400, Guilty Lockup
              CRTimer = 0x00000E3ACB47, OSTimer = 170198.506674416, CyclesElapsed = 47176960
              PreResetTimeInCycles = 39424, HWResetTimeInCycles = 19456, FreelistReconTimeInCycles = 425216, TotalRecoveryTimeInCycles = 484096
DM 3 (HWRflags 0x00000000: working;)
DM 4 (HWRflags 0x00000000: working;)
RGX Kernel CCB WO:0x74 RO:0x74
RGX Firmware CCB WO:0x1C RO:0x1C
RGX Kernel CCB commands executed = 28507380
RGX SLR: Forced UFO updates requested = 0
RGX Errors: WGP:0, TRP:0
Thread0: FW IRQ count = 39815913
Last sampled IRQ count in LISR = 39815913
FW System config flags = 0x00020000 (Ctx switch options: Medium CSW profile;)
FW OS config flags = 0x0000000F (Ctx switch: TDM; GEOM; 3D; CDM;)
 (!) RGX power is down. No registers dumped
jwinarske commented 11 months ago

I raised TI issue here: https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1280666/sk-tda4vm-rogue-powervr-display-driver-stability