xbianonpi / xbian

XBMC on Raspberry Pi, Bleeding Edge
https://xbian.org
GNU General Public License v3.0
294 stars 44 forks source link

Kernel Panic #706

Closed The-Exnor closed 9 years ago

The-Exnor commented 9 years ago

As requested by f1vefour here is as picture of Kernel panic situation that is recurrent on 2 units (http://forum.xbian.org/thread-2783-page-3.html) and (http://forum.xbian.org/thread-2827.html).

Picture (http://s10.postimg.org/y6r0r3mnd/Kernel.jpg)

Have you any idea @mk01

Thanks.

The-Exnor commented 9 years ago

@f1vefour @CurlyMoo

So my test unit that has kodi stopped, so far did no crash or Panic. (3.18.8+, with all the updates as of yesterday). I notice some "lag" via SSH console that was not there prior to this Kernel, but so far it passed the stress tests i made (reaching 48h without a Panic)

The other unit (where i can still watch a movie) when alone, freezes/crashes Kodi. (so typically i have to kill/start kodi twice a day on this unit).

Can this be an issue with Kodi itself or any of the services supporting it?

CurlyMoo commented 9 years ago

Does this test unit run as download box with lots of IO?

The-Exnor commented 9 years ago

@CurlyMoo No... its idling with TOP running... but i did test moving big files over Wlan and it did no crash.

CurlyMoo commented 9 years ago

Can you do a IO / CPU stresstest somehow for an extended period of time?

The-Exnor commented 9 years ago

Yes... how do you propose i do that? Besides Kodi what else can use the CPU to the max?

CurlyMoo commented 9 years ago

I would suggest googling for a nice script...

Smultie commented 9 years ago

http://stackoverflow.com/questions/2925606/how-to-create-a-cpu-spike-with-a-bash-command

For example.

So fulload() { dd if=/dev/zero of=/dev/null | dd if=/dev/zero of=/dev/null | dd if=/dev/zero of=/dev/null | dd if=/dev/zero of=/dev/null & }; fulload; read; killall dd should work

The-Exnor commented 9 years ago

@CurlyMoo just started kodi on that unit and it crashed in less than 5 min.... started slughish then input become even more slow and finally froze...

@Smultie thnks mate going to try that one... i want to see if this is related to Kodi itself.

CurlyMoo commented 9 years ago

The script from @smultie just tests memory IO, not disk IO.

Smultie commented 9 years ago

It stresses all cores, which was the goal, right?

CurlyMoo commented 9 years ago

The goal is disk io :)

Smultie commented 9 years ago

I quote (you) Can you do a IO / CPU stresstest somehow for an extended period of time?

It does 50% of that.

Why don't you just post a script to be sure we do the right thing¿?

CurlyMoo commented 9 years ago

Because i would have to search for it as well.

Smultie commented 9 years ago

Fair enough ;)

The-Exnor commented 9 years ago

Found a nice Linux program for that. its called "stress"

Here is the output of 60 seconds of running. (oh and no crashes since i've stopped Kodi):

stress -v -t 60 -c 1 stress: info: [3357] dispatching hogs: 1 cpu, 0 io, 0 vm, 0 hdd stress: dbug: [3357] using backoff sleep of 3000us stress: dbug: [3357] setting timeout to 60s stress: dbug: [3357] --> hogcpu worker 1 [3358] forked stress: dbug: [3357] <-- worker 3358 signalled normally stress: info: [3357] successful run completed in 60s

Here is a 180 second test with IO and CPU:

stress: info: [3375] dispatching hogs: 1 cpu, 4 io, 0 vm, 0 hdd stress: dbug: [3375] using backoff sleep of 15000us stress: dbug: [3375] setting timeout to 180s stress: dbug: [3375] --> hogcpu worker 1 [3376] forked stress: dbug: [3375] --> hogio worker 4 [3377] forked stress: dbug: [3375] using backoff sleep of 9000us stress: dbug: [3375] setting timeout to 180s stress: dbug: [3375] --> hogio worker 3 [3378] forked stress: dbug: [3375] using backoff sleep of 6000us stress: dbug: [3375] setting timeout to 180s stress: dbug: [3375] --> hogio worker 2 [3379] forked stress: dbug: [3375] using backoff sleep of 3000us stress: dbug: [3375] setting timeout to 180s stress: dbug: [3375] --> hogio worker 1 [3380] forked

stress: dbug: [3375] <-- worker 3376 signalled normally stress: dbug: [3375] <-- worker 3378 signalled normally stress: dbug: [3375] <-- worker 3380 signalled normally stress: dbug: [3375] <-- worker 3377 signalled normally stress: dbug: [3375] <-- worker 3379 signalled normally stress: info: [3375] successful run completed in 180s

Smultie commented 9 years ago

Some results from my side:

xbian@xbian ~ $ stress --cpu 4 --io 4 --vm 4 --hdd 4 --timeout 1m stress: info: [7602] dispatching hogs: 4 cpu, 4 io, 4 vm, 4 hdd stress: FAIL: 7602 <-- worker 7617 got signal 9 stress: WARN: 7602 now reaping child worker processes stress: FAIL: 7602 <-- worker 7605 got signal 9 stress: WARN: 7602 now reaping child worker processes stress: FAIL: 7602 failed run completed in 7s xbian@xbian ~ $ stress --cpu 4 --timeout 1m stress: info: [7627] dispatching hogs: 4 cpu, 0 io, 0 vm, 0 hdd stress: info: [7627] successful run completed in 60s xbian@xbian ~ $ stress --io 4 --timeout 1m stress: info: [7659] dispatching hogs: 0 cpu, 4 io, 0 vm, 0 hdd stress: info: [7659] successful run completed in 60s xbian@xbian ~ $ stress --vm 4 --timeout 1m stress: info: [7876] dispatching hogs: 0 cpu, 0 io, 4 vm, 0 hdd stress: FAIL: 7876 <-- worker 7879 got signal 9 stress: WARN: 7876 now reaping child worker processes stress: FAIL: 7876 <-- worker 7880 got signal 9 stress: WARN: 7876 now reaping child worker processes stress: FAIL: 7876 failed run completed in 2s xbian@xbian ~ $ stress --hdd 4 --timeout 1m stress: info: [7884] dispatching hogs: 0 cpu, 0 io, 0 vm, 4 hdd stress: info: [7884] successful run completed in 60s xbian@xbian ~ $ stress --vm 4 --timeout 1m stress: info: [7920] dispatching hogs: 0 cpu, 0 io, 4 vm, 0 hdd stress: FAIL: 7920 <-- worker 7924 got signal 9 stress: WARN: 7920 now reaping child worker processes stress: FAIL: 7920 failed run completed in 18s xbian@xbian ~ $ stress --vm 3 --timeout 1m stress: info: [7932] dispatching hogs: 0 cpu, 0 io, 3 vm, 0 hdd stress: FAIL: 7932 <-- worker 7935 got signal 9 stress: WARN: 7932 now reaping child worker processes stress: FAIL: 7932 failed run completed in 3s xbian@xbian ~ $ stress --vm 2 --timeout 1m stress: info: [7938] dispatching hogs: 0 cpu, 0 io, 2 vm, 0 hdd stress: info: [7938] successful run completed in 60s

Note that --vm greater than 2 crashes.

The-Exnor commented 9 years ago

Well system only crashed when and everytime i use Kodi... any ideas @CurlyMoo @f1vefour ?

CurlyMoo commented 9 years ago

Can you try installing previous versions of Kodi? Preferably 13.2?

The-Exnor commented 9 years ago

Ok... you mean XBMC ;) . i dont think i need to go so far as that... previous to the update that started all this mess everything was fine. But i will try to install 13.2

The-Exnor commented 9 years ago

@CurlyMoo So i burned an old img i had on backup (xbmc 13) and everything works fine. Does not crash/freeze or Panic (kernel is 3.14 i think ) but its not updatable and almost of the add-on i need are not compatible anymore :/

Now on the same rPi i've tested, again with the other SDcard, and current version available still freezes Kodi after some time. BUT without Kodi running it stays stable now.

Since i've runned all the stress tests i know, can it be some of the services that Kodi uses that are making this happen? (note that for this test i used no OC, so ARM @ 700 etc etc...).

CurlyMoo commented 9 years ago

And if you downgrade Kodi as i asked before?

The-Exnor commented 9 years ago

@CurlyMoo well i didn't downgrade from the current img... How do i do that?

CurlyMoo commented 9 years ago

Searching is your friend: "Apt downgrade".

The-Exnor commented 9 years ago

Done that... i need to know the name of the package to install...

CurlyMoo commented 9 years ago

dpkg --get-selections | grep xbian

bairdy commented 9 years ago

@The-Exnor

try this apt-get install xbian-package-xbmc=14.1-1423177674

I've had the same issues with the most recent version of kodi over the last week or so, aswell as having CEC not responding properly (taking upto 10 minutes to after switching on the tv or to the pi's source before working) and crashing with both HD/SD content locally or over smb by causing a kernel panic if media has been paused by swapping source on the tv then switching back to kodi or even if kodi has been left idle for half our or so on pause which has perviously been ok till kodi updated had been running for over 4 months and had no issue previously.

I have now switched back to the older version of kodi above and everything seems to working fine for the last 24 hours no crashes or issues but not 100% sure yet will get back in a day or two, but think it could be an issue with kodi 4.2 doing something strange as i have a clone of the one i use for watching media on another pi running a download server with loads of I/O and quite a bit of cpu usage unpacking rar files and has kodi disabled on start up which has ran solidly with no crashes untill i swapped the pis round and started kodi and disabled the download services on the spare then it crashed and did the same after a few hours.

The-Exnor commented 9 years ago

@CurlyMoo going to try on one of the units

@bairdy Thks mate, going to try this on the other unit, and yes i notice that the problems appear to have some relation with Kodi. With Kodi turned off the system can handle stress tests and even real big files move over LAN without issues (even on 3.18.8+ Kernel).

My question to @CurlyMoo and @f1vefour is if its possible that one of the necessary services/programs that are included might be causing the instability.

CurlyMoo commented 9 years ago

It still tells me IO is a likely cause. A new version of Kodi might raise the IO more then others?

mk01 commented 9 years ago

@f1vefour

tim, can you just try this config ? https://github.com/xbianonpi/xbian-package-kernel/raw/master/extra-files/rpi-3.18.y/.config.try

mk01 commented 9 years ago

@bairdy

the "idle" -> "resume" crashing won't be related - this looks as a regression (or new but leading to same problem) in 14.x. After @Smultie was asking I tested on imx6 and have the same against vanilla NFS (debian linux NFS3). for 16 months I remember imx6, this wasn't happening before.

(actually with my setup / devices I don't remember that even from RPI). only prove it is out of kernel is RPI1 image from Oct/Nov past year with kernel to be tested - that way we exclude two other significant factor - firmware, which, was for sure from 80% rewritten with introducing RPI2.

raspberry has (and always had) HDMI, CEC, all this code in FW (as blobs).

rolftimmerman commented 9 years ago

A short update from my side. A week ago I installed the 'stable' version of XBian via the download tool. After that I've updated everything except for the kernel (which is now still 3.17.7-ck2+). I haven't had any complete freezes except that Kodi doesn't respond to CEC after a while (and sometimes freezes). But a restart of Kodi fixes that. In the mean time I was able to download and copy stuff around.

Is there anything I can test from my side with this configuration?

The-Exnor commented 9 years ago

@CurlyMoo @f1vefour @mk01 @bairdy

Well the revert to an older Kodi so far is working better... But it still get sluggish sometime (but not as bad as 14.2).

@CurlyMoo you say it must be related to I/O, but how can it be if i've tested all the possible stress scenarios i can think of and i was unable to reproduce either a crash or Panic... furthermore before the update that started all of this, both my units never had any issues of this type and i do push them hard. And are you referring to IO from what part? CPU to RAM; SoC to SDcard; IO access to USB controller; Data handling inside the CPU/GPU/logic of the SoC? I your theory is correct then i have 2 faulty rPi units from different batches... and its quite a coincidence they both got the same problem at the same time with the same software update.

My theory is that in some part of the updated software an unintentional part the code is creating this scenario. I don't think is Kodi alone because i still got random Panics running BTRFS scrubs (but not all the times i've run it...) with Kodi not running. But since the day that this situation occurred some other software parts where updated and Raspbian repos and some on Xbian and now Panics are extremely rare and so far only with Kodi running.

This is all very frustrating... Old imgs on xbmc (13.x) run without this issues, but if i stick with that i can't update the software...

I appreciate all the efforts you guys make. Sorry for the rant.

CurlyMoo commented 9 years ago

Because of all users reporting issues, (almost) all of them used it as a download box as well. I for example never had issues, but just using XBian for Kodi and have all files on a NFS share. So hardly any disk IO, just some network IO when i watch a movie.

The-Exnor commented 9 years ago

@CurlyMoo

Ok... i also only use Xbian for Kodi alone (i do not use it as a download computer because i want it just to be a HTPC and thats it). I do use it for very high bit rate AVC (.mkv container with average 15Mbits/s AVC file and AAC file with DTS or Dolby audio stream) files and the units never ever (even with the update) crashed during a file play over NFS shares or USB drive. The crashes/freezes and Panics were all under NO network or USB data transfer situations.

I also transferred an 4GiB file twice from my NAS to both units to test that part (as i referred in a previous post) and no crash or Panic...

When i'm not seeing a movie/series the units stay On doing nothing more than running Kodi at idle.

I wish i could debug this better...

The-Exnor commented 9 years ago

@bairdy "try this apt-get install xbian-package-xbmc=14.1-1423177674"

Mate thanks for this. i'm now using it for 2 days and no more kernel panics :)
Kodi still crash and burn sometimes but it restarts automatically (i assume this is the setting on the service(?)) but apart from that the OS is now, as far i can tell, stable.

Note for all that this unit is running with LZO settings and 3.18.8+ Kernel.

@CurlyMoo @mk01 @f1vefour

On my other unit i've updated last night (UK time of 22h) and now its stuck at boot loading the X libraries ... Any ideas on this?

f1vefour commented 9 years ago

@mk01 I am in the middle of moving and starting a new job, someone else will have to take this on for a while as I have no time. Sorry :(

The-Exnor commented 9 years ago

@f1vefour are you leaving the project?

Smultie commented 9 years ago

@The-Exnor "...take this on for a while" ....

He'll be back ;)

The-Exnor commented 9 years ago

@bairdy @CurlyMoo @mk01 @f1vefour

Unit running xbian-package-xbmc=14.1-1423177674 now running for 3 days with ZERO Linux crashes/Panics (Kodi other issues still persist but the OS part appears to be gone)(Kernel 13.18.8+, LZO on 2nd partition).

As of this morning i've also reverted to this Kodi version on my bedroom unit... lets see if it holds

f1vefour commented 9 years ago

I'm not leaving, just have to take a bit of time off until things settle.

The-Exnor commented 9 years ago

@f1vefour ok mate :)

So far no more Kernel Panics.

Smultie commented 9 years ago

Tbh: I'm running the latest packages and haven't seen a kernel panic for ~ a week I think.

The-Exnor commented 9 years ago

@CurlyMoo @f1vefour @Smultie @bairdy

Reporting ZERO panic or crashes from the OS so far. Still using Kodi 14.1. All other parts are updated.

The-Exnor commented 9 years ago

@CurlyMoo @f1vefour and everyone else.

Reporting that still no more problems on the OS part... i think you can close this thread. Thks for all the help guys.

rolftimmerman commented 9 years ago

Again I've updated everything however the only thing that freezes is Kodi but my RPI is still running. I also have a dmesg which you can find here: http://pastebin.com/ntCweJQ6

Maybe it's good to know that I'm only using the RPi to watch movies which are stored on my USB HDD.

The-Exnor commented 9 years ago

Kodi 14.1 does not freeze on me anymore but its performance is erratic compared to 13 (xbmc).

Fabio72 commented 9 years ago

For me kodi freezes seeme related to screensaver. Since I disabled screensaver no more kodi problems. I had this issue http://forum.xbian.org/thread-2912.html but now (without screensaver) kodi is up since a couple of days

f1vefour commented 9 years ago

What resolved your issue @The-Exnor?

The-Exnor commented 9 years ago

@f1vefour I don't have any conclusive idea... 8 days ago (or was it more?), I've reinstalled both units using LZO compression for the OS partition, disabled some services i don't use (LIRC, Ahvi), reverted Kodi to the one suggested above (14.1.x), fully updated to this day and no more Panics...

I speculate that the problem was never on the 3.18.8+ Kernel but in some process that was running on the background (was not Kodi because as i stated the Panics occurred even with Kodi not running), but since the original problem a lot of software was updated on a regular basis and i can only think that one of the programs/services/whatever that was updated, was in fact the culprit. Still Kodi is randomly sluggish compared to xbmc... and as of today i'm running 14.2 with the same issues.

@Fabio72 yep removing the screensaver was my 1st action back when i got the 1st few freezes... Kodi does not freeze/crash anymore but its performance is way below the one that xbmc (13) had. I have random "slowdowns" and sluggish input performance and i cant pin point the source of the problem... True is that it only happens when Kodi is running (system gets slow even on SSH with Kodi on )

Fabio72 commented 9 years ago

@The-Exnor yes, also for me kodi has some performance issues. Slowdowns or short freezes navigating menus or browsing nfs. Playback is still fine. The only things I could see on dmesg are hrtimer: interrupt took 54000 ns but happens once a week, not more. And: Apr 4 09:47:16 xbian kernel: [262645.634151] INFO: rcu_preempt detected stalls on CPUs/tasks: {} (detected by 0, t=21002 jiffies, g=6630648, c=6630647, q=1052) Apr 4 09:47:16 xbian kernel: [262645.634187] INFO: Stall ended before state dump start but happened once