xbianonpi / xbian

XBMC on Raspberry Pi, Bleeding Edge
https://xbian.org
GNU General Public License v3.0
294 stars 44 forks source link

System Freeze/Kernel Panic after changing HDMI sources #698

Open bellini666 opened 9 years ago

bellini666 commented 9 years ago

I think I'm having this issue for about 2 weeks.

It works fine until (specially after being idle and ~1min after I switch from another hdmi source to xbian) it freezees everything. It freezes in a way that I can't login using ssh so I don't have logs to post here.

I thought my sdcard had some problems, so I migrated to an nfs installation. Still the same problem. I actually just changed my sdcard to a scandisk pro class 10 and nothing has changed.

I also tried using 3.17.7-ck2+-1421550584 a suggested and other workaround suggested in #691 , but no luck too.

I really don't now what is causing this... I never had any issue like this before (I've been using xbian for almost 2 years now)


Other than this description, there is already some information I put on #691 . I'm creating this one since @f1vefour thinks this issue is directly related to HDMI.

There we discovered that this issue is actually a kernel panic as you can see on this image and this other one (posted by @s47).

mk01 commented 9 years ago

@hackedbellini

:) we know that if YOU tell it, it just is as you tell it. don't worry. just quickly tell me, have you migrated to RPI2 too, or still the RPI?

I'm asking because the simple change going from uniCPU setup to SMP often leads to detecting BUGs with NO CODE change. this is because internal locking design in kernel and parallel code execution - the SAME commands (used in drivers) inside kernel are pre-processed to different functions/macros (when compiled for singleCPU / SMP). AND also locks/code paths sufficient before (while no parallelism inside kernel was possible on single CPU) are mostly not enough for parallel executions.

long story short -> 2years of NO problem is a little assurance when changing environments (1->4cores).

if you still run RPI1, above is not relevant and the steps to test would be: 1) go for downgrade of debs for kernel and xbmc (and maybe xbian-package-firmware too) to a configuration you know was working OK. (here I'm thinking about the recent kernel changes we did (3.12->3.15->3.16->3.17). then XBMC vs KODI and finally firmware - which for sure took MANY changes because of RPI2 support. 2) you can out-smart your frozen RPI by little trick. start XBMC. open ssh, type "xlll". that will hook xbmc/kodi.log and print EACH line immediatelly onto your ssh console. LEAVE that open and leave the TV. once you return and you will face the freeze again, 99% probably we will catch the very last log lines from xbmc. you can do the same for kernel/system log, by opening second ssh session and running "tailf /var/log/kern.log".

those actions are out of order, I would start with 2) and then depending on outputs we reconsider 1).

mk01 commented 9 years ago

@hackedbellini

also, if we are SURE it happens ONLY when XBMC is running, try to figure out IF there is any relevance to XBMC being at screensaver when returning to XBMC source, or not. probably you will have to set screensaver timeout for VERY long time or even turn OFF CEC for that test as XBian will force XBMC to screensaver and rendering off when TV going off.

but let's start with the log output catching - hopefully we will see there enough.

f1vefour commented 9 years ago

I tried to get a tail but the user told me there was nothing printed when the kernel panicked. I didn't think to turn CEC off.

mk01 commented 9 years ago

oh yes, xbian-update is turning rsynclogd be off normally. so after boot

service rsyslogd start

then /var/log gets filled with logs again. also usefull is to put "loglevel=7" into cmdline.txt. that turns MOST verbose logging by kernel on (default is level 3 or 4). or (if I'm right) the same is by running

dmesg -n debug

from command line (it just sets the same option for kernel verbose level)

bellini666 commented 9 years ago

Hey @mk01 .

Sorry for not replying earlier, I've been very busy those days. Let me answer your questions:

if you still run RPI1, above is not relevant and the steps to test would be:

I'm still running RP1 with the same RPI I had for the last 2 years =P.

1) go for downgrade of debs for kernel and xbmc (and maybe xbian-package-firmware too) to a configuration you know was working OK.

I already trying downgrading the kernel and it had the same problem. I didn't try xbmc and firmware though. When I have some time, I'll surely try to downgrade the firmware and report back to you.

you can out-smart your frozen RPI by little trick. start XBMC. open ssh, type "xlll". that will hook xbmc/kodi.log and print EACH line immediatelly onto your ssh console. LEAVE that open and leave the TV. once you return and you will face the freeze again, 99% probably we will catch the very last log lines from xbmc. you can do the same for kernel/system log, by opening second ssh session and running "tailf /var/log/kern.log".

Actually, that was the trick I used to get the log here. Actually, that was to get kodi log. My /var/log/kern.log was/is empty, so I did a watch -n 0.1 "dmesg | tail".

The kodi log paste is not valid anymore, but id didn't have any useful information (I can do it again it you need it).

Maybe watch wasn't fast enough to get the last print on dmesg. Do you know what is wrong that /var/log/kern.log is empty for me?

service rsyslogd start

rsyslogd: unrecognized service

also, if we are SURE it happens ONLY when XBMC is running, try to figure out IF there is any relevance to XBMC being at screensaver when returning to XBMC source, or not. probably you will have to set screensaver timeout for VERY long time or even turn OFF CEC for that test as XBian will force XBMC to screensaver and rendering off when TV going off.

Actually, I found something interesting.

Those days, when changing sources, instead of going crazy using kodi at the same time, I was just pressing a button to make sure kodi leave screensaver and waiting for ~30sec. After that, I could use kodi without experiencing a freeze in most of the situations.

Since leaving screensaver is something that probably triggers some IO, and using kodi will too (e.g. it will load thumbnauils for each show/episode I select during that time, etc), I was thinking if @Diak might not be on to something here.

I also found this post where someone is having the same symptoms on OpenElec.

Although, I tried stressing it with stress command, to produce a lot of IO, and could not make it freeze. Tried running it when screensaver was active, when it was not and when it was just being deactivated.

bellini666 commented 9 years ago

Based on my comment here, I'm thinking if my issue is really something else.

CurlyMoo commented 9 years ago

Please try compress=lzo for btrfs and reboot (/boot/cmdline.txt)

bellini666 commented 9 years ago

@CurlyMoo mine already has that. This is my cmdline.txt:

telnet zswap.enabled=1 zswap.compressor=lz4 sdhci-bcm2708.sync_after_dma=0 dwc_otg.lpm_enable=0 console=tty1 root=/dev/mmcblk0p2 rootflags=subvol=root/@,thread_pool=2,autodefrag,compress=lzo,commit=120 rootfstype=btrfs rootwait smsc95xx.turbo_mode=N logo.nologo quiet noswap loglevel=0 mod_scsi.scan=sync partswap startevent=mountall splash nohdparm --startup-event mountall
bellini666 commented 9 years ago

But the fact that I can reproduce the issue with btrfs scrub, probably means, IMHO, that it is related to IO. Not necessarily to the sdcard itself, since I could reproduce the problem when I moved my system to nfs.

CurlyMoo commented 9 years ago

Hmm, compress is lz4 by default. I could finish a scrub with lzo and not with lz4... IO is also my guess, but what specificly?

f1vefour commented 9 years ago

When did you change it to lzo? I mean how long have you been running lzo instead of lz4?

bellini666 commented 9 years ago

@f1vefour as far as I can remember, I never actually changed it. Since I've been using xbian, the first time I changed /boot/cmdline.txt was to move my installation to nfs.

Maybe it was default when I installed it?

CurlyMoo commented 9 years ago

lz4 is the default compression since ages.

bellini666 commented 9 years ago

@CurlyMoo my installation is from the middle of 2013 if I'm not mistaken.

I can be confused though, its been so long that I have xbian running that maybe I changed cmdline.txt one day to test something and don't remember.

Do you want me to try to change lzo to lz4 to see if it changes anything?

Btw, in another tentative to find another way to crash my pi, I found this script somewhere to do a stress test on it:

#!/bin/bash
#Simple stress test for system. If it survives this, it's probably stable.
#Free software, GPL2+

echo "Testing overclock stability..."

#Max out the CPU in the background (one core). Heats it up, loads the power-supply. 
nice yes >/dev/null &

#Read the entire SD card 10x. Tests RAM and I/O
for i in `seq 1 10`; do echo reading: $i; sudo dd if=/dev/mmcblk0 of=/dev/null bs=4M; done

#Writes 512 MB test file, 10x.
for i in `seq 1 10`; do echo writing: $i; dd if=/dev/zero of=deleteme.dat bs=1M count=512; sync; done

#Clean up
killall yes
rm deleteme.dat

#Print summary. Anything nasty will appear in dmesg.
echo -n "CPU freq: " ; cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
echo -n "CPU temp: " ; cat /sys/class/thermal/thermal_zone0/temp
dmesg | tail 

echo "Not crashed yet, probably stable."

It just completed for me without crashing

f1vefour commented 9 years ago

@hackedbellini Please try this updated kernel for RPi 1 and let me know how it goes.

https://db.tt/sNM5ty8M

Install by doing:

sudo cp /boot/kernel.img /boot/kernel.good
sudo dpkg -i xbian-package-kernel_3.19.3+_armhf.deb
bellini666 commented 9 years ago

@f1vefour I updated here and, so far so good!

I'm trying to reproduce the issue like I described in the description and it is refusing to freeze!

Let's give it a couple of days to be sure, but hopefully upgrading the kernel will fix it!

Thank you!

bellini666 commented 9 years ago

@f1vefour I think you can consider "upgrading to 3.19" as a fixer for this issue.

And maybe it would also fix the issue for people on #691

f1vefour commented 9 years ago

Some are still having issues even with 3.19.3, I'm at a loss really.

Thanks for reporting back, I will leave the issue open in case anything comes up.

wlatendresse commented 9 years ago

Can this kernel be used for RPI 2 as well?

f1vefour commented 9 years ago

No, Pi 1 only.

bellini666 commented 9 years ago

Just one strange thing that started happening after upgrading the kernel:

The freeze really stopped, but CEC is not reliable anymore. I mean, I have issue #495 for a while, but it is really easy to workaround. It just won't automatically reconnect CEC after turning the tv off for a certain amount of time, but I can do that manually.

Now, that issue is a little worse. After turning the tv off for a while, after turning it on and changing sources to xbian, it will either:

1) Not be connected to CEC and will fail on every attempt to do so. The device appears as a valid choice on the CEC menu (named simplink on lg tvs) but selecting it wont do anything (unlike the original issue where it would connect right away)

2) CEC would not even be recognized. The device would appear inactive on the CEC menu.

So, while I was writing this, I remembered that I could run cec-client -d 31 to test it. While my system was in case 1) above, I did it, changed the source to xbian, connected CEC (this time it did), pressed some buttons (although they were recognized by cec-client as you can see on the log, kodi did not receive them) and changed the source back to my cable. Here is the log

Btw, I just set env DEBUG='--debug' on /etc/init/xbmc.conf. Lets see if it gives me some informations regarding the issue when it happens again. If so, I'll update the issue here.

mk01 commented 9 years ago

@hackedbellini

this test was done with constant libcec? you was changing only kernel or firmware ?

bellini666 commented 9 years ago

@mk01 this is with a stable+staging installation and the kernel provided by @f1vefour .

One thing that I noticed that might have something to do with 1): Sometimes when restarting my pi, although CEC says it was connected on the notification, its connection will actually be delayed. It is strange because it captures what I sent there but only processes after some period of time.

For example, I press the right arrow button 3 times and the left 1 time. Nothing happens on kodi (and even the log doesn't print anything). After ~2-5min, all those actions will happen at the same time (the log will print cec information like I just did that). After that, CEC will behave normally. So, CEC is really connected and receiving my events from the beginning, but only starts processing them after an amount of time.

mk01 commented 9 years ago

@hackedbellini

on march-28 I committed a xbmc/devicecec patch which solved similar issue on IMX6. I pushed it to "rpi" branch too, but l ooks like rpi package wasn't compiled after. I will recheck the auto-building,... and just trigger adhock build now. You - retest with the next .deb (in 4-8 hours) and let me know.

bellini666 commented 9 years ago

Hey @mk01 .

So, I updated xbian-package-xbmc to 14.2-1429432779. Yes, it seems to make things better for CEC.

I still could reproduce 2) from my comment above though. I noticed that it usually happens after leaving the TV off for a good amount of time (for example, when I go to sleep).

mk01 commented 9 years ago

@hackedbellini

and as TV is off for long time (you sleep), you are used to quit XBMC, or it is idling (but running) and the CEC suddenly disappears from list of attached devices ???

mk01 commented 9 years ago

@hackedbellini

when you will be testing, remove that ugly setting from config.txt (if you still use it) and test for that issue as well.

https://github.com/xbianonpi/xbian/issues/648
bellini666 commented 9 years ago

@mk01

and as TV is off for long time (you sleep), you are used to quit XBMC, or it is idling (but running) and the CEC suddenly disappears from list of attached devices ???

Yes. Basically, when I'm done with kodi, I let it there (in any screen), change sources to my cable tv and turn the TV off.

Before this issue, when coming back, I had to manually ask for cec to reconnect (as is described on #495)

After the issue, most of the times I could not do that because the TV would not display the raspberry as an option. It would seem like the raspberry was not connected to the hdmi.

when you will be testing, remove that ugly setting from config.txt (if you still use it) and test for that issue as well.

I tried that already =P. The error seemed to happen even without it.

Btw, like I mentioned earlier, I just came back from a trip to the USA and I got a raspberry pi 2 there.

I installed the latest stable version and all those issues here are not present there (only #495 when the TV is off for some time, but it really doesn't bother).

So, fortunatly for me, I don't have the issues anymore. Unfortunatly for you, I can't test stuff on rpi 1 anymore as I'm now using it for something else =P.

Anyway, thanks for helping me so far! If I find any problems on rpi2, I'll open an issue for it. But right now it is looking pretty good! :)