victronenergy / venus

Victron Energy Unix/Linux OS
https://github.com/victronenergy/venus/wiki
584 stars 75 forks source link

Frequent gui restarts due to sigsegv #928

Open philipa opened 2 years ago

philipa commented 2 years ago

Cerbo GX, running Venus 2.84.

The GUI restarts a few times a day. The following is logged:

Apr  4 18:44:25 einstein user.info kernel: [720547.446354] potentially unexpected fatal signal 11.
Apr  4 18:44:25 einstein user.warn kernel: [720547.451347] CPU: 1 PID: 15248 Comm: gui Tainted: G           O      5.10.42-venus-6 #1
Apr  4 18:44:25 einstein user.warn kernel: [720547.459440] Hardware name: Allwinner sun7i (A20) Family
Apr  4 18:44:25 einstein user.warn kernel: [720547.464816] PC is at 0x4bc557e4
Apr  4 18:44:25 einstein user.warn kernel: [720547.468050] LR is at 0x4bc56764
Apr  4 18:44:25 einstein user.warn kernel: [720547.471278] pc : [<4bc557e4>]    lr : [<4bc56764>]    psr: 60070010
Apr  4 18:44:25 einstein user.warn kernel: [720547.477694] sp : be8bfe70  ip : b6f02228  fp : 01549838
Apr  4 18:44:25 einstein user.warn kernel: [720547.483040] r10: 00000001  r9 : 00000000  r8 : 00000000
Apr  4 18:44:25 einstein user.warn kernel: [720547.488353] r7 : be8bff38  r6 : 02bb5c30  r5 : 00000000  r4 : 00000003
Apr  4 18:44:25 einstein user.warn kernel: [720547.494995] r3 : be8bff37  r2 : be8bff38  r1 : 00000001  r0 : 02bb8330
Apr  4 18:44:25 einstein user.warn kernel: [720547.501620] Flags: nZCv  IRQs on  FIQs on  Mode USER_32  ISA ARM  Segment user
Apr  4 18:44:25 einstein user.warn kernel: [720547.508965] Control: 10c5387d  Table: 559d406a  DAC: 00000055
Apr  4 18:44:25 einstein user.warn kernel: [720547.514834] CPU: 1 PID: 15248 Comm: gui Tainted: G           O      5.10.42-venus-6 #1
Apr  4 18:44:25 einstein user.warn kernel: [720547.522831] Hardware name: Allwinner sun7i (A20) Family
Apr  4 18:44:25 einstein user.warn kernel: [720547.528168] [<c010d200>] (unwind_backtrace) from [<c010a3b0>] (show_stack+0x10/0x14)
Apr  4 18:44:25 einstein user.warn kernel: [720547.536009] [<c010a3b0>] (show_stack) from [<c06565dc>] (dump_stack+0x98/0xac)
Apr  4 18:44:25 einstein user.warn kernel: [720547.543324] [<c06565dc>] (dump_stack) from [<c0134118>] (get_signal+0x790/0x794)
Apr  4 18:44:25 einstein user.warn kernel: [720547.550810] [<c0134118>] (get_signal) from [<c0109a4c>] (do_work_pending+0x110/0x55c)
Apr  4 18:44:25 einstein user.warn kernel: [720547.558727] [<c0109a4c>] (do_work_pending) from [<c01000cc>] (slow_work_pending+0xc/0x20)
Apr  4 18:44:25 einstein user.warn kernel: [720547.566982] Exception stack(0xc0c79fb0 to 0xc0c79ff8)
Apr  4 18:44:25 einstein user.warn kernel: [720547.572118] 9fa0:                                     02bb8330 00000001 be8bff38 be8bff37
Apr  4 18:44:25 einstein user.warn kernel: [720547.580375] 9fc0: 00000003 00000000 02bb5c30 be8bff38 00000000 00000000 00000001 01549838
Apr  4 18:44:25 einstein user.warn kernel: [720547.588631] 9fe0: b6f02228 be8bfe70 4bc56764 4bc557e4 60070010 00000000
root@einstein:/var/log# grep "fatal signal" messages
Apr  4 10:44:36 einstein user.info kernel: [691758.583965] potentially unexpected fatal signal 11.
Apr  4 13:02:59 einstein user.info kernel: [700061.619810] potentially unexpected fatal signal 11.
Apr  4 16:37:11 einstein user.info kernel: [712914.047744] potentially unexpected fatal signal 11.
Apr  4 18:44:25 einstein user.info kernel: [720547.446354] potentially unexpected fatal signal 11.
Apr  5 00:13:24 einstein user.info kernel: [740287.043607] potentially unexpected fatal signal 11.
Apr  5 07:57:33 einstein user.info kernel: [768135.234798] potentially unexpected fatal signal 11.

Looks similar to #143.

It happens when the system is quiet (no heavy loads etc.) and no one is touching the screen.

I don't recall this happening before 2.84.

jhofstee commented 2 years ago

Can you mention which devices are attached, or enable VRM + Remote Support + internet access and which site this is about. Without more details / a reproducer, it is not possible to fix this.

philipa commented 2 years ago

Thanka for picking up on this. The site id is 113424

I've enabled remote access. Do I need to set up an account for you?

jhofstee commented 2 years ago

Hi, this is a modified gui, you need to ask that here, https://community.victronenergy.com/spaces/31/index.html.

philipa commented 2 years ago

Sorry, I should have mentioned that. I have previously reproduced the problem without the gui modifications.

I'll uninstall the modifications and report back when the problem happens again.

jhofstee commented 2 years ago

If you know how to reproduce that with a standard install, please let us know and we will look into that.

philipa commented 2 years ago

The problem has repeated a few times a day without the gui modifications.

I've left the modifications uninstalled and the site remains available for remote support over VRM.

philipa commented 2 years ago

@jhofstee I can't re-open this ticket, should I open a new one?

philipa commented 2 years ago

Opened #955.

philipa commented 2 years ago

@mpvader @jhofstee, I'd be grateful if someone could grab whatever diagnostics are useful so I can disable remote support. Also, the system is on a travelling yacht, so connectivity can be patchy.

mpvader commented 2 years ago

Hi @philipa , yes we have the diagnostics; thanks! You're welcome to close the Remote Support if you want to.

Its not an easy thing to solve by the way, hard to reproduce, and seems rather deep down in a third party library we use.

philipa commented 2 years ago

Thanks @mpvader.

I'll reconfigure the system, but let me know if you need access again. Happy to help if I can.

mpvader commented 2 years ago

Hi @philipa : is this still happening to your system? We havent, at least not as far as I know, explicitly fixed it. But then again we also never explicitly broke something: the current theory is that this is some bug that already resided deep down in the Qt library. And now triggered due to some unrelated change we made somewhere.

philipa commented 2 years ago

Yes, still occurring regularly with v2.89.

It seems strange that no one else is reporting it.

mpvader commented 2 years ago

Probably goes mostly unnoticed. Though I did get other reports as well. And thanks for reminding me that its on versions prior to v2.90 as well! So at least its not a blocker for releasing v2.90

philipa commented 1 year ago

Any progress on this? It's still happening with 2.92. I'm reminded every evening when I see the display restart in the corner of the cabin.

I'd be happy to deploy debug builds to track this down, or help how

peterscarsten commented 1 year ago

Hello,

i've detected the same issue on my Cerbo GX. I've updated to the latest version 2.94 but the issue is still there.

Do you have any updates on this?

Here some entries from the last four days:

messages.5:Jun  9 04:56:26 einstein user.info kernel: [1555307.044409] potentially unexpected fatal signal 11.
messages.5:Jun  9 14:34:40 einstein user.info kernel: [1590000.341490] potentially unexpected fatal signal 11.
messages.5:Jun  9 16:20:52 einstein user.info kernel: [1596372.692552] potentially unexpected fatal signal 11.
messages.4:Jun  9 18:31:23 einstein user.info kernel: [1604203.111487] potentially unexpected fatal signal 11.
messages.4:Jun  9 21:13:55 einstein user.info kernel: [1613955.359497] potentially unexpected fatal signal 11.
messages.4:Jun  9 23:44:30 einstein user.info kernel: [1622989.914503] potentially unexpected fatal signal 11.
messages.3:Jun 10 02:59:29 einstein user.info kernel: [1634688.625935] potentially unexpected fatal signal 11.
messages.3:Jun 10 05:40:53 einstein user.info kernel: [1644372.465185] potentially unexpected fatal signal 11.
messages.3:Jun 10 08:23:06 einstein user.info kernel: [1654104.946154] potentially unexpected fatal signal 11.
messages.3:Jun 10 10:38:38 einstein user.info kernel: [1662237.154165] potentially unexpected fatal signal 11.
messages.2:Jun 10 14:55:31 einstein user.info kernel: [1677649.698713] potentially unexpected fatal signal 11.
messages.2:Jun 10 17:14:50 einstein user.info kernel: [1686009.050445] potentially unexpected fatal signal 11.
messages.2:Jun 10 20:13:38 einstein user.info kernel: [1696736.288597] potentially unexpected fatal signal 11.
messages.2:Jun 10 22:31:28 einstein user.info kernel: [1705006.080807] potentially unexpected fatal signal 11.
messages.2:Jun 11 01:02:12 einstein user.info kernel: [1714050.748313] potentially unexpected fatal signal 11.
messages.1:Jun 11 04:41:40 einstein user.info kernel: [1727217.987286] potentially unexpected fatal signal 11.
messages.1:Jun 11 07:34:08 einstein user.info kernel: [1737565.532092] potentially unexpected fatal signal 11.
messages.1:Jun 11 09:37:26 einstein user.info kernel: [1744961.160222] potentially unexpected fatal signal 11.
messages.0:Jun 11 12:25:37 einstein user.info kernel: [1755054.223897] potentially unexpected fatal signal 11.
messages.0:Jun 11 15:53:56 einstein user.info kernel: [1767553.286626] potentially unexpected fatal signal 11.
messages.0:Jun 11 18:54:23 einstein user.info kernel: [1778380.780260] potentially unexpected fatal signal 11.
messages.0:Jun 11 22:14:12 einstein user.info kernel: [1790369.113086] potentially unexpected fatal signal 11.
messages.0:Jun 12 01:29:51 einstein user.info kernel: [1802107.840382] potentially unexpected fatal signal 11.
messages.0:Jun 12 04:49:54 einstein user.info kernel: [1814110.486705] potentially unexpected fatal signal 11.
messages.0:Jun 12 07:35:27 einstein user.info kernel: [1824043.470034] potentially unexpected fatal signal 11.
messages:Jun 12 10:20:54 einstein user.info kernel: [1833970.232015] potentially unexpected fatal signal 11.
messages:Jun 12 12:57:41 einstein user.info kernel: [1843377.525065] potentially unexpected fatal signal 11.
peterscarsten commented 1 year ago

Perhaps it's related to this: https://community.victronenergy.com/questions/131941/touch-gx-and-i-assume-cerbo-rebooting.html

Because I'm also using the Ruuvi tags with an additional bluetooth dongle.