raspberrypi / firmware

This repository contains pre-compiled binaries of the current Raspberry Pi kernel and modules, userspace libraries, and bootloader/GPU firmware.
5.18k stars 1.68k forks source link

SDCard corruption on RPi3 #601

Closed vrabac closed 8 years ago

vrabac commented 8 years ago

I just get a new set of RPi3 (two of them) but i am having Issue with microSD and read-only file system after working on it or just do an reboot!

I installed Archlinuxarm as stated in archlinuxarm wiki, and everything was fine, then after a reboot I was waiting for SSH to connect, but there after few minutes there was nothing. I hook the RPi3 on TV and i saw some weird stuff going all time. What I could read/make picture was: blk_update_request: I/O error, dev mmcblk0, sector

An power cycle was only solution, as logging directly over console was also not possible. Then I did install some tools and again I did clean reboot, same thing! Then i used second set did same installation, and there on beginning there was no such message but then, after starting some application it stuck again. This time SSH was still working and I was able to pull more from log (dmesg) see this link: Pastelog Here are some information of software I am using:

Please note that I have an RPi2 for a long time and i did not saw this message. Also the microSD Cards are new so i hope they are not both broken.

Hardware:
Raspberry PI 3 Model B 1GB 
Offizielles Netzteil (2.5A, 5.1V) für Raspberry Pi 3 Model B 
Verbatim 44043 16GB Tablet microSDHC Card 16GB Class10 
Offizielles Gehäuse Raspberry PI 3 weiß / rot

$ uname -a
Linux vdrrpi3 4.4.10-1-ARCH #1 SMP Fri May 13 18:48:24 MDT 2016 armv7l GNU/Linux
$
$ pacman -Qi raspberrypi-firmware
Name            : raspberrypi-firmware
Version         : 20160513-1
..
$
$ /opt/vc/bin/vcgencmd version
May 13 2016 15:46:00
Copyright (c) 2012 Broadcom
version 280c320a06a8ee014c9ea8df8c1350daa2c50d25 (clean) (release)
$

$ dmesg | grep mmc-bcm
[    4.069914] mmc-bcm2835 3f300000.mmc: mmc_debug:0 mmc_debug2:0
[    4.071426] mmc-bcm2835 3f300000.mmc: DMA channel allocated
$
$ mount | grep mmcblk0
/dev/mmcblk0p2 on / type ext4 (rw,relatime,data=ordered)
/dev/mmcblk0p1 on /boot type vfat (rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=ascii,shortname=mixed,errors=remount-ro)
$

$ grep . /sys/class/mmc_host/mmc0/mmc0:*/* 2>/dev/null
/sys/class/mmc_host/mmc0/mmc0:0007/cid:275048534431364730013dc549010139
/sys/class/mmc_host/mmc0/mmc0:0007/csd:400e00325b59000073a77f800a4000eb
/sys/class/mmc_host/mmc0/mmc0:0007/date:01/2016
/sys/class/mmc_host/mmc0/mmc0:0007/erase_size:512
/sys/class/mmc_host/mmc0/mmc0:0007/fwrev:0x0
/sys/class/mmc_host/mmc0/mmc0:0007/hwrev:0x3
/sys/class/mmc_host/mmc0/mmc0:0007/manfid:0x000027
/sys/class/mmc_host/mmc0/mmc0:0007/name:SD16G
/sys/class/mmc_host/mmc0/mmc0:0007/oemid:0x5048
/sys/class/mmc_host/mmc0/mmc0:0007/preferred_erase_size:4194304
/sys/class/mmc_host/mmc0/mmc0:0007/scr:0235800201000000
/sys/class/mmc_host/mmc0/mmc0:0007/serial:0x013dc549
/sys/class/mmc_host/mmc0/mmc0:0007/type:SD
/sys/class/mmc_host/mmc0/mmc0:0007/uevent:DRIVER=mmcblk
/sys/class/mmc_host/mmc0/mmc0:0007/uevent:MMC_TYPE=SD
/sys/class/mmc_host/mmc0/mmc0:0007/uevent:MMC_NAME=SD16G
/sys/class/mmc_host/mmc0/mmc0:0007/uevent:MODALIAS=mmc:block
$
$ cat /boot/config.txt
# See /boot/overlays/README for all available options
# custom settings
gpu_mem=256

$ cat /proc/cmdline
8250.nr_uarts=0 dma.dmachans=0x7f35 bcm2708_fb.fbwidth=1824 bcm2708_fb.fbheight=984 bcm2709.boardrev=0xa02082 bcm2709.serial=0x985907a6 smsc95xx.macaddr=XX:XX:XX:XX:XX:XX bcm2708_fb.fbswap=1 bcm2709.uart_clock=48000000 vc_mem.mem_base=0x3dc00000 vc_mem.mem_size=0x3f000000  root=/dev/mmcblk0p2 rw rootwait console=ttyS0,115200 console=tty1 selinux=0 plymouth.enable=0 smsc95xx.turbo_mode=N dwc_otg.lpm_enable=0 kgdboc=ttyS0,115200 elevator=noop
$

I saw there is an olther Issue reported but with RPi2 Issue #397

If you need any more information to debug this Issue i would be glad to test it. Any Suggestion is also welcome. A workaround to power cycle RPi3 is not an solution and will for sure corrupt system!

vrabac commented 8 years ago

This thing is relay annoying. Here is an crash with device disconnect till nothing else works.. Pastelog

May 16 16:08:28 vdrrpi3 kernel: busconnected_show(ad942810) -> platform_dev ad942800, otg_dev ad3fd780
May 16 16:08:28 vdrrpi3 kernel: devspeed_show(ad942810) -> platform_dev ad942800, otg_dev ad3fd780
May 16 16:08:28 vdrrpi3 kernel: WARN::dwc_otg_handle_mode_mismatch_intr:68: Mode Mismatch Interrupt: currently in Host mode
May 16 16:08:28 vdrrpi3 kernel: enumspeed_show(ad942810) -> platform_dev ad942800, otg_dev ad3fd780
May 16 16:08:28 vdrrpi3 kernel: WARN::dwc_otg_handle_mode_mismatch_intr:68: Mode Mismatch Interrupt: currently in Host mode
May 16 16:08:28 vdrrpi3 kernel: ggpio_show(ad942810) -> platform_dev ad942800, otg_dev ad3fd780
May 16 16:08:28 vdrrpi3 kernel: gnptxfsiz_show(ad942810) -> platform_dev ad942800, otg_dev ad3fd780
May 16 16:08:28 vdrrpi3 kernel: gotgctl_show(ad942810) -> platform_dev ad942800, otg_dev ad3fd780
May 16 16:08:28 vdrrpi3 kernel: gpvndctl_show(ad942810) -> platform_dev ad942800, otg_dev ad3fd780
May 16 16:08:28 vdrrpi3 kernel: grxfsiz_show(ad942810) -> platform_dev ad942800, otg_dev ad3fd780
May 16 16:08:28 vdrrpi3 kernel: gsnpsid_show(ad942810) -> platform_dev ad942800, otg_dev ad3fd780
May 16 16:08:28 vdrrpi3 kernel: guid_show(ad942810) -> platform_dev ad942800, otg_dev ad3fd780
May 16 16:08:28 vdrrpi3 kernel: gusbcfg_show(ad942810) -> platform_dev ad942800, otg_dev ad3fd780
May 16 16:08:28 vdrrpi3 kernel: hnpcapable_show(ad942810) -> platform_dev ad942800, otg_dev ad3fd780
May 16 16:08:28 vdrrpi3 kernel: hprt0_show(ad942810) -> platform_dev ad942800, otg_dev ad3fd780
May 16 16:08:28 vdrrpi3 kernel: hptxfsiz_show(ad942810) -> platform_dev ad942800, otg_dev ad3fd780
May 16 16:08:28 vdrrpi3 kernel: hsic_connect_show(ad942810) -> platform_dev ad942800, otg_dev ad3fd780
May 16 16:08:28 vdrrpi3 kernel: inv_sel_hsic_show(ad942810) -> platform_dev ad942800, otg_dev ad3fd780
May 16 16:08:28 vdrrpi3 kernel: mode_show(ad942810) -> platform_dev ad942800, otg_dev ad3fd780
May 16 16:08:28 vdrrpi3 kernel: HZ 100, MSEC_PER_JIFFIE 10, loops_per_jiffy 192000
May 16 16:08:28 vdrrpi3 kernel: Core Global Registers
pelwell commented 8 years ago

Could I ask you to try a Raspbian release first to rule out an Arch-specific problem? Download this old Jessie-Lite image (http://downloads.raspberrypi.org/raspbian_lite/images/raspbian_lite-2016-03-18/2016-03-18-raspbian-jessie-lite.zip), flash it to a card and boot your Pi, use raspi-config to expand the filing system then run sudo apt-get update and sudo apt-get upgrade. This will give the network and SD card a workout (which is why I didn't just ask you to install the latest Raspbian). After the upgrade completes reboot and see if it starts up OK.

pelwell commented 8 years ago

On your Arch image you could also try putting dtoverlay=mmc in your config.txt; this will have the side effect of disabling your WiFi, but it will rule out problems with the other SD card driver.

vrabac commented 8 years ago

So I did try a Raspbian like you wrote, and indeed there were no SDCard corruption while updating from 2016-03-18 to today. I restart this RPi3 for more then 10 times without corruption.

pi@raspberrypi:~ $ uname -a
Linux raspberrypi 4.1.19-v7+ #858 SMP Tue Mar 15 15:56:00 GMT 2016 armv7l GNU/Linux
pi@raspberrypi:~ $ sudo shutdown -r now
pi@raspberrypi:~ $ uname -a
Linux raspberrypi 4.1.19-v7+ #858 SMP Tue Mar 15 15:56:00 GMT 2016 armv7l GNU/Linux
pi@raspberrypi:~ $
pi@raspberrypi:~ $ /opt/vc/bin/vcgencmd version
Mar 15 2016 14:47:28
Copyright (c) 2012 Broadcom
version 1bf9a9a77026af9128a339c82d72e331d3532ee4 (clean) (release)
pi@raspberrypi:~ $
pi@raspberrypi:~ $ /opt/vc/bin/vcgencmd get_config int
arm_freq=1200
audio_pwm_mode=1
audio_sdm_mod_order=2
config_hdmi_boost=5
core_freq=400
desired_osc_freq=0x36ee80
disable_commandline_tags=2
disable_l2cache=1
emmc_pll_core=1
force_eeprom_read=1
force_pwm_open=1
framebuffer_ignore_alpha=1
framebuffer_swap=1
gpu_freq=300
hdmi_force_cec_address=65535
init_uart_clock=0x2dc6c00
lcd_framerate=60
over_voltage_avs=0x27ac4
overscan_bottom=48
overscan_left=48
overscan_right=48
overscan_top=48
pause_burst_frames=1
program_serial_random=1
sdram_freq=450
temp_limit=85
pi@raspberrypi:~ $ /opt/vc/bin/vcgencmd get_config str
device_tree=-
pi@raspberrypi:~ $
pi@raspberrypi:~ $ /opt/vc/bin/vcgencmd measure_clock arm
frequency(45)=600064000
pi@raspberrypi:~ $ /opt/vc/bin/vcgencmd measure_clock core
frequency(1)=250000000
pi@raspberrypi:~ $ /opt/vc/bin/vcgencmd get_config int | grep -e core_freq -e enable_uart -e force_turbo
core_freq=400
pi@raspberrypi:~ $
pi@raspberrypi:~ $ cat /etc/fstab
proc            /proc           proc    defaults          0       0
/dev/mmcblk0p1  /boot           vfat    defaults          0       2
/dev/mmcblk0p2  /               ext4    defaults,noatime  0       1
# a swapfile is not a swap partition, no line here
#   use  dphys-swapfile swap[on|off]  for that
pi@raspberrypi:~ $
pi@raspberrypi:~ $ grep -v '^$\|^\s*\#' /boot/cmdline.txt
dwc_otg.lpm_enable=0 console=serial0,115200 console=tty1 root=/dev/mmcblk0p2 rootfstype=ext4 elevator=deadline fsck.repair=yes rootwait
pi@raspberrypi:~ $ grep -v '^$\|^\s*\#' /boot/config.txt
dtparam=audio=on
pi@raspberrypi:~ $

Then I did start the md5sum command together with find, and after few minutes there was an weird message in journal/dmesg which is posted below. I am not sure if these two things have something together.

May 18 19:12:09 raspberrypi sudo[1163]: pi : TTY=pts/1 ; PWD=/home/pi ; USER=root ; COMMAND=/usr/bin/find / -name *.* -exec /usr/bin/md5sum {} ;
May 18 19:19:39 raspberrypi dhcpcd[635]: eth0: fe80::c43d:c7ff:fe86:ed9a router available
May 18 19:23:18 raspberrypi kernel: INFO: task kworker/u8:1:935 blocked for more than 120 seconds.
May 18 19:23:18 raspberrypi kernel:       Not tainted 4.1.19-v7+ #858
May 18 19:23:18 raspberrypi kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 18 19:23:18 raspberrypi kernel: kworker/u8:1    D 80573e10     0   935      2 0x00000000
May 18 19:23:18 raspberrypi kernel: Workqueue: kmmcd mmc_rescan
May 18 19:23:18 raspberrypi kernel: [<80573e10>] (__schedule) from [<80574324>] (schedule+0x4c/0xa4)
May 18 19:23:18 raspberrypi kernel: [<80574324>] (schedule) from [<80439bfc>] (__mmc_claim_host+0xb8/0x194)
May 18 19:23:18 raspberrypi kernel: [<80439bfc>] (__mmc_claim_host) from [<80439cf8>] (mmc_get_card+0x20/0x24)
May 18 19:23:18 raspberrypi kernel: [<80439cf8>] (mmc_get_card) from [<8044120c>] (mmc_sd_detect+0x2c/0x80)
May 18 19:23:18 raspberrypi kernel: [<8044120c>] (mmc_sd_detect) from [<8043c0b0>] (mmc_rescan+0xc8/0x324)
May 18 19:23:18 raspberrypi kernel: [<8043c0b0>] (mmc_rescan) from [<8003e0b8>] (process_one_work+0x154/0x458)
May 18 19:23:18 raspberrypi kernel: [<8003e0b8>] (process_one_work) from [<8003e410>] (worker_thread+0x54/0x500)
May 18 19:23:18 raspberrypi kernel: [<8003e410>] (worker_thread) from [<80043bec>] (kthread+0xec/0x104)
May 18 19:23:18 raspberrypi kernel: [<80043bec>] (kthread) from [<8000faf8>] (ret_from_fork+0x14/0x3c)
May 18 19:24:58 raspberrypi sudo[1163]: pam_unix(sudo:session): session closed for user root
May 18 19:26:19 raspberrypi dhcpcd[635]: eth0: fe80::c43d:c7ff:fe86:ed9a router available
Ruffio commented 8 years ago

@vrabac could you write your exact commands? Then I will try to reproduce on my RPI3

vrabac commented 8 years ago

@Ruffio Sorry that i did not wrote it as seperate command.It is written in last log

sudo find / -name "*.so" -exec /usr/bin/md5sum {} \;
sudo find / -name "*.*" -exec /usr/bin/md5sum {} \;

With the dtoverlay=mmc the situation is better but now i have somehow an hang with differen information. See picture below.

https://abload.de/img/sdcardcrashunsdo.jpg

Ruffio commented 8 years ago

@vrabac hmmm, executing the two cammands didn't reveal any errors on my RPI3. I have only set gpu=128, using logo.nologo and noatime, besides of that I havn't made any changes, I think.

I'm no expert, but have you tried with a completely new Raspbian image and just updated that with sudo apt-get update and sudo apt-get upgrade ? Also the firmware?

Hmmm, are you sure that you ahve upgraded to latest? Here is my uname -a: pi@raspberrypi:~ $ uname -a Linux raspberrypi 4.4.11-v7+ #887 SMP Thu May 19 16:24:03 BST 2016 armv7l GNU/Linux

You are using an older version compared to mine: 4.1.19-v7+ #858

When using a new RPI, I use this to be up to date: sudo apt-get install rpi-update -y #Get latest firmware
sudo rpi-update #Update firmware sudo apt-get update -y #Updates package references sudo apt-get upgrade -y #Install package updates sudo apt-get autoremove -y #Auto remove unused packages

Latest version are located here: https://www.raspberrypi.org/downloads/raspbian/

vrabac commented 8 years ago

@Ruffio thank you for your test. Well I dont use debian/buntu based distros, I did just that above that @pelwell wrote me to try. I can try tonight with Raspbian again. But i would realy like to have this fixed somehow on Archlinux (arm)!

vrabac commented 8 years ago

So today I have had time and test again raspbian. I did use version @pelwell suggest, and did upgrade to see if there would be corruption. There was nothing. Then i used your @Ruffio command to update firmware and kernel. And also no corruption. But doing md5sum on *.* produced some weird stuff which could have something to do with corruption?

I hope this can help somehow to track the issue, I need to use archlinuxarm and not raspbian, so if I can help somehow more to provide more log/debug information please let me know.

May 21 15:07:28 raspberrypi sudo[1156]: pi : TTY=pts/1 ; PWD=/home/pi ; USER=root ; COMMAND=/usr/bin/find / -name *.* -exec /usr/bin/md5sum {} ;
May 21 15:07:28 raspberrypi sudo[1156]: pam_unix(sudo:session): session opened for user root by pi(uid=0)
May 21 15:10:48 raspberrypi dhcpcd[659]: eth0: fe80::c43d:c7ff:fe86:ed9a router available
May 21 15:14:47 raspberrypi dhcpcd[659]: eth0: fe80::c43d:c7ff:fe86:ed9a router available
May 21 15:17:02 raspberrypi CRON[27600]: pam_unix(cron:session): session opened for user root by (uid=0)
May 21 15:17:02 raspberrypi CRON[27604]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
May 21 15:17:02 raspberrypi CRON[27600]: pam_unix(cron:session): session closed for user root
May 21 15:20:04 raspberrypi dhcpcd[659]: eth0: fe80::c43d:c7ff:fe86:ed9a router available
sMay 21 15:26:29 raspberrypi kernel: INFO: task kworker/u8:2:82 blocked for more than 120 seconds.
May 21 15:26:29 raspberrypi kernel:       Not tainted 4.4.11-v7+ #887
May 21 15:26:29 raspberrypi kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 21 15:26:29 raspberrypi kernel: kworker/u8:2    D 805b3b24     0    82      2 0x00000000
May 21 15:26:29 raspberrypi kernel: Workqueue: kmmcd mmc_rescan
May 21 15:26:29 raspberrypi kernel: [<805b3b24>] (__schedule) from [<805b409c>] (schedule+0x50/0xa8)
May 21 15:26:29 raspberrypi kernel: [<805b409c>] (schedule) from [<8046ab38>] (__mmc_claim_host+0xb8/0x1cc)
May 21 15:26:29 raspberrypi kernel: [<8046ab38>] (__mmc_claim_host) from [<8046ac7c>] (mmc_get_card+0x30/0x34)
May 21 15:26:29 raspberrypi kernel: [<8046ac7c>] (mmc_get_card) from [<80472d3c>] (mmc_sd_detect+0x2c/0x80)
May 21 15:26:29 raspberrypi kernel: [<80472d3c>] (mmc_sd_detect) from [<8046d28c>] (mmc_rescan+0xc8/0x324)
May 21 15:26:29 raspberrypi kernel: [<8046d28c>] (mmc_rescan) from [<8003c634>] (process_one_work+0x154/0x458)
May 21 15:26:29 raspberrypi kernel: [<8003c634>] (process_one_work) from [<8003c98c>] (worker_thread+0x54/0x500)
May 21 15:26:29 raspberrypi kernel: [<8003c98c>] (worker_thread) from [<80042668>] (kthread+0xec/0x104)
May 21 15:26:29 raspberrypi kernel: [<80042668>] (kthread) from [<8000fbc8>] (ret_from_fork+0x14/0x2c)
May 21 15:26:45 raspberrypi dhcpcd[659]: eth0: fe80::c43d:c7ff:fe86:ed9a router available
May 21 15:30:29 raspberrypi kernel: INFO: task kworker/u8:2:82 blocked for more than 120 seconds.
May 21 15:30:29 raspberrypi kernel:       Not tainted 4.4.11-v7+ #887
May 21 15:30:29 raspberrypi kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 21 15:30:29 raspberrypi kernel: kworker/u8:2    D 805b3b24     0    82      2 0x00000000
May 21 15:30:29 raspberrypi kernel: Workqueue: kmmcd mmc_rescan
May 21 15:30:29 raspberrypi kernel: [<805b3b24>] (__schedule) from [<805b409c>] (schedule+0x50/0xa8)
May 21 15:30:29 raspberrypi kernel: [<805b409c>] (schedule) from [<8046ab38>] (__mmc_claim_host+0xb8/0x1cc)
May 21 15:30:29 raspberrypi kernel: [<8046ab38>] (__mmc_claim_host) from [<8046ac7c>] (mmc_get_card+0x30/0x34)
May 21 15:30:29 raspberrypi kernel: [<8046ac7c>] (mmc_get_card) from [<80472d3c>] (mmc_sd_detect+0x2c/0x80)
May 21 15:30:29 raspberrypi kernel: [<80472d3c>] (mmc_sd_detect) from [<8046d28c>] (mmc_rescan+0xc8/0x324)
May 21 15:30:29 raspberrypi kernel: [<8046d28c>] (mmc_rescan) from [<8003c634>] (process_one_work+0x154/0x458)
May 21 15:30:29 raspberrypi kernel: [<8003c634>] (process_one_work) from [<8003c98c>] (worker_thread+0x54/0x500)
May 21 15:30:29 raspberrypi kernel: [<8003c98c>] (worker_thread) from [<80042668>] (kthread+0xec/0x104)
May 21 15:30:29 raspberrypi kernel: [<80042668>] (kthread) from [<8000fbc8>] (ret_from_fork+0x14/0x2c)
May 21 15:31:48 raspberrypi dhcpcd[659]: eth0: fe80::c43d:c7ff:fe86:ed9a router available
May 21 15:33:58 raspberrypi sudo[1156]: pam_unix(sudo:session): session closed for user root

pi@raspberrypi:~ $ /opt/vc/bin/vcgencmd version
May 20 2016 19:01:31
Copyright (c) 2012 Broadcom
version faf071dd4885c5ac1a89483d35a5326e7f69495f (clean) (release)
pi@raspberrypi:~ $ uname -a
Linux raspberrypi 4.4.11-v7+ #887 SMP Thu May 19 16:24:03 BST 2016 armv7l GNU/Linux
pi@raspberrypi:~ $
pi@raspberrypi:~ $ grep -v '^$\|^\s*\#' /boot/cmdline.txt
dwc_otg.lpm_enable=0 console=serial0,115200 console=tty1 root=/dev/mmcblk0p2 rootfstype=ext4 elevator=deadline fsck.repair=yes rootwait
pi@raspberrypi:~ $ grep -v '^$\|^\s*\#' /boot/config.txt
dtparam=audio=on
pi@raspberrypi:~ $ cat /etc/fstab
proc            /proc           proc    defaults          0       0
/dev/mmcblk0p1  /boot           vfat    defaults          0       2
/dev/mmcblk0p2  /               ext4    defaults,noatime  0       1
# a swapfile is not a swap partition, no line here
#   use  dphys-swapfile swap[on|off]  for that
pi@raspberrypi:~ $
pi@raspberrypi:~ $ /opt/vc/bin/vcgencmd get_config int
arm_freq=1200
audio_pwm_mode=1
config_hdmi_boost=5
core_freq=400
desired_osc_freq=0x36ee80
disable_commandline_tags=2
disable_l2cache=1
force_eeprom_read=1
force_pwm_open=1
framebuffer_ignore_alpha=1
framebuffer_swap=1
gpu_freq=300
hdmi_force_cec_address=65535
init_uart_clock=0x2dc6c00
lcd_framerate=60
over_voltage_avs=0x27ac4
overscan_bottom=48
overscan_left=48
overscan_right=48
overscan_top=48
pause_burst_frames=1
program_serial_random=1
sdram_freq=450
temp_limit=85
pi@raspberrypi:~ $
vrabac commented 8 years ago

I took from my RPi2 the mSD card and installed Archlinuxarm like always, and with this card on same RPi3 i dont have any problems so far. So there must be something incompatible between Verbatim and RPi3. This mSD card works just fine when the OS is not installed on it.

Transcend 32GB Class 10 microSD HC I card details which works fine:

$ uname -a
Linux vdr02 4.4.11-2-ARCH #1 SMP Mon May 23 18:41:09 MDT 2016 armv7l GNU/Linux
$

$ /opt/vc/bin/vcgencmd version
May 20 2016 19:01:31
Copyright (c) 2012 Broadcom
version faf071dd4885c5ac1a89483d35a5326e7f69495f (clean) (release)
$

$ grep . /sys/class/mmc_host/mmc0/mmc0:*/* 2>/dev/null
/sys/class/mmc_host/mmc0/mmc0:e624/cid:035344534c33324780e012b979002615
/sys/class/mmc_host/mmc0/mmc0:e624/csd:400e00325b590000edc87f800a4040c3
/sys/class/mmc_host/mmc0/mmc0:e624/date:06/2002
/sys/class/mmc_host/mmc0/mmc0:e624/erase_size:512
/sys/class/mmc_host/mmc0/mmc0:e624/fwrev:0x0
/sys/class/mmc_host/mmc0/mmc0:e624/hwrev:0x8
/sys/class/mmc_host/mmc0/mmc0:e624/manfid:0x000003
/sys/class/mmc_host/mmc0/mmc0:e624/name:SL32G
/sys/class/mmc_host/mmc0/mmc0:e624/oemid:0x5344
/sys/class/mmc_host/mmc0/mmc0:e624/preferred_erase_size:4194304
/sys/class/mmc_host/mmc0/mmc0:e624/scr:0235800100000000
/sys/class/mmc_host/mmc0/mmc0:e624/serial:0xe012b979
/sys/class/mmc_host/mmc0/mmc0:e624/type:SD
/sys/class/mmc_host/mmc0/mmc0:e624/uevent:DRIVER=mmcblk
/sys/class/mmc_host/mmc0/mmc0:e624/uevent:MMC_TYPE=SD
/sys/class/mmc_host/mmc0/mmc0:e624/uevent:MMC_NAME=SL32G
/sys/class/mmc_host/mmc0/mmc0:e624/uevent:MODALIAS=mmc:block
$
pelwell commented 8 years ago

The most recent firmware release (23rd May) adds an mmc_block.card_quirks parameter. If you can manage to rpi-update, add this to your cmdline,txt:

mmc_block.card_quirks=0x80000000

This will disable the use of ERASE on that card, even though the card is not listed in the table of offenders. The patch also displays the quirks if any are set, so you can use it to confirm whether your card is in the table. If you find that setting the quirk solves your problem then let me know the card's name, manfid and oemid as found above and I will add it to the list.

If the rpi-update isn't successful, there should be a minor Raspbian update in the next few weeks that will include the change.

vrabac commented 8 years ago

I am using Archlinuxarm and there is already firmware release from 20160523 (commit=3b98f7433649e13cf08f54f509d11491c99c4c0b).

It works maybe better but still as soon i start an application (VDR) or just do reboot there is an crash. See this pastee link.

mSD card did pass the test:

Verbatim:
Test finished without errors.
You can now delete the test files *.h2w or verify them again.
Writing speed: 8.20 MByte/s
Reading speed: 73.7 MByte/s
H2testw v1.4

Here are list of current application. Maybe weird to find no mmcblk0p2 in /etc/fstab .

$ uname -a
Linux vdr03 4.4.11-3-ARCH #1 SMP Wed May 25 18:50:04 MDT 2016 armv7l GNU/Linux
$

$ /opt/vc/bin/vcgencmd version
May 20 2016 19:01:31
Copyright (c) 2012 Broadcom
version faf071dd4885c5ac1a89483d35a5326e7f69495f (clean) (release)
$

$ grep -v '^$\|^\s*\#' /boot/cmdline.txt
root=/dev/mmcblk0p2 rw rootwait ipv6.disable=1 console=ttyAMA0,115200 console=tty1 selinux=0 plymouth.enable=0 smsc95xx.turbo_mode=N dwc_otg.lpm_enable=0 kgdboc=ttyAMA0,115200 elevator=noop mmc_block.card_quirks=0x80000000
$

$ grep -v '^$\|^\s*\#' /boot/config.txt
gpu_mem=256
$

$ cat /etc/fstab
#
# /etc/fstab: static file system information
#
# <file system> <dir>   <type>  <options>       <dump>  <pass>
/dev/mmcblk0p1  /boot   vfat    defaults        0       0
$

$ /opt/vc/bin/vcgencmd get_config int
arm_freq=1200
audio_pwm_mode=1
config_hdmi_boost=5
core_freq=400
desired_osc_freq=0x36ee80
disable_commandline_tags=2
disable_l2cache=1
force_eeprom_read=1
force_pwm_open=1
framebuffer_ignore_alpha=1
framebuffer_swap=1
gpu_freq=300
hdmi_force_cec_address=65535
init_uart_clock=0x2dc6c00
lcd_framerate=60
over_voltage_avs=0x2625a
overscan_bottom=48
overscan_left=48
overscan_right=48
overscan_top=48
pause_burst_frames=1
program_serial_random=1
sdram_freq=450
temp_limit=85
$

$ grep . /sys/class/mmc_host/mmc0/mmc0:*/* 2>/dev/null
/sys/class/mmc_host/mmc0/mmc0:0007/cid:275048534431364730013dc549010139
/sys/class/mmc_host/mmc0/mmc0:0007/csd:400e00325b59000073a77f800a4000eb
/sys/class/mmc_host/mmc0/mmc0:0007/date:01/2016
/sys/class/mmc_host/mmc0/mmc0:0007/erase_size:512
/sys/class/mmc_host/mmc0/mmc0:0007/fwrev:0x0
/sys/class/mmc_host/mmc0/mmc0:0007/hwrev:0x3
/sys/class/mmc_host/mmc0/mmc0:0007/manfid:0x000027
/sys/class/mmc_host/mmc0/mmc0:0007/name:SD16G
/sys/class/mmc_host/mmc0/mmc0:0007/oemid:0x5048
/sys/class/mmc_host/mmc0/mmc0:0007/preferred_erase_size:4194304
/sys/class/mmc_host/mmc0/mmc0:0007/scr:0235800201000000
/sys/class/mmc_host/mmc0/mmc0:0007/serial:0x013dc549
/sys/class/mmc_host/mmc0/mmc0:0007/type:SD
/sys/class/mmc_host/mmc0/mmc0:0007/uevent:DRIVER=mmcblk
/sys/class/mmc_host/mmc0/mmc0:0007/uevent:MMC_TYPE=SD
/sys/class/mmc_host/mmc0/mmc0:0007/uevent:MMC_NAME=SD16G
/sys/class/mmc_host/mmc0/mmc0:0007/uevent:MODALIAS=mmc:block
$
vrabac commented 8 years ago

This time crashed also after starting VDR application. Here is the pastelog

vrabac commented 8 years ago

I am experiencing almost every day an corruption/crash like in laste pastelog link, where i need to unplug the PSU to boot in OS again. Is anything else needed from my side? Can I test something else?

pelwell commented 8 years ago

That's a very interesting log, unlike any other I've seen before. It's going to take a while to analyse it, but the main thing I can see is that the interface was idle for at least 27 second, then when the OS asked to read 32 sectors it received no data back (although the card did continue to respond to command traffic). I have seen similar behaviour when trying to access sectors at the very end of some cards before, but these sectors were at about 6.5GB in so it shouldn't be the same problem.

I think this bit of the log is very suspicious:

[   94.090296] cfg80211: Verifying active interfaces after reg change
[16645.673439] usb 1-1.5: usbfs: process 379 (mediasrv) did not claim interface 2 before use
[16654.003413] mmcblk0: card_busy_detect: error sending status cmd, status 0x4080900

The system is happy for 4h35m, then you get a USB warning message 8 seconds before the first SD card error message. I also think it's interesting that the SD error is detected by the upper levels of the SD/MMC stack, not the driver - the card is responding but not in the way it ought to.

0x4080900 equates to a WP_VIOLATION error ("Set when the host attempts to write to a protected block or to the temporary or permanent write protected card."). That sounds like the card has gone into write-protect mode, perhaps in response to adverse conditions. The proximity to the USB message makes me wonder if something on the USB bus has malfunctioned or is drawing too much power, and that has somehow affected the SD card as well.

What else do you have attached to your Pi (a powered hub, for example), and what are you using it for?

vrabac commented 8 years ago

Thank you for your replay. I have an DVB-S2 Card with its own power supply connected to the USB port of RPi3 and nothing else. I am using this RPi3 for LiveTV (VDR) and for video playback (Kodi - ok this is still not setup because of card corruption)

The RPi3 power supply is the original one (Offizielles Netzteil (2.5A, 5.1V) für Raspberry Pi 3 Model B )

When using an different microSD card the Transcend 32GB (SL32G) no such issue at all. Same Hardware connected to RPi3.

But I have second RPi3 without any extra Hardware connected, with same problem, this is installed on an TV, doing movie loop playback.

Here is output from lsbusb

# lsusb
Bus 001 Device 004: ID 2659:1502 Sundtek
Bus 001 Device 003: ID 0424:ec00 Standard Microsystems Corp. SMSC9512/9514 Fast Ethernet Adapter
Bus 001 Device 002: ID 0424:9514 Standard Microsystems Corp. SMC9514 Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
#

Here is output from mediaclient (application from Sundtek)

# /opt/bin/mediaclient -e
**** List of Media Hardware Devices ****
device 0: [Sundtek SkyTV Ultimate V (2015)]  DVB-S/S2, ANALOG-TV, REMOTE-CONTROL, OSS-AUDIO
  [INFO]:
     STATUS: ACTIVE
  [BUS]:
     ID: 1-1.5
  [SERIAL]:
     ID: UXXXXXXXXX
  [DVB-S/S2]:
     FESTATUS: ACTIVE
     LNBVOLTAGE: ENABLED
     LNBSUPPLY: ACADAPTER
     LNBSTATUS: OK
     FRONTEND: /dev/dvb/adapter0/frontend0
     DVR: /dev/dvb/adapter0/dvr0
     DMX: /dev/dvb/adapter0/demux0
  [ANALOG-TV]:
     VIDEO0: /dev/video0
     VBI0: /dev/vbi0
  [REMOTECONTROL]:
     INPUT0: /dev/mediainput0
  [OSS]:
     OSS0: /dev/dsp0

#
vrabac commented 8 years ago

This time it did happend without any USB device connected. Here is dmesg and here is journalctl log from what i did (just start and stop VDR application).

$ uname -a
Linux vdr03 4.4.11-4-ARCH #1 SMP Tue May 31 19:51:14 MDT 2016 armv7l GNU/Linux
$

$ /opt/vc/bin/vcgencmd version
May 31 2016 11:45:05
Copyright (c) 2012 Broadcom
version 6faf136a2445687e1bd90c8c763a8796ace73be2 (clean) (release)
$

$ grep -v '^$\|^\s*\#' /boot/cmdline.txt
root=/dev/mmcblk0p2 rw rootwait ipv6.disable=1 console=ttyAMA0,115200 console=tty1 selinux=0 plymouth.enable=0 smsc95xx.turbo_mode=N dwc_otg.lpm_enable=0 kgdboc=ttyAMA0,115200 elevator=noop mmc_block.card_quirks=0x80000000
$

$ grep -v '^$\|^\s*\#' /boot/config.txt
gpu_mem=256
$

$ cat /etc/fstab
#
# /etc/fstab: static file system information
#
# <file system> <dir>   <type>  <options>       <dump>  <pass>
/dev/mmcblk0p1  /boot   vfat    defaults        0       0
$

$ /opt/vc/bin/vcgencmd get_config int
arm_freq=1200
audio_pwm_mode=1
config_hdmi_boost=5
core_freq=400
desired_osc_freq=0x36ee80
disable_commandline_tags=2
disable_l2cache=1
force_eeprom_read=1
force_pwm_open=1
framebuffer_ignore_alpha=1
framebuffer_swap=1
gpu_freq=300
hdmi_force_cec_address=65535
init_uart_clock=0x2dc6c00
lcd_framerate=60
over_voltage_avs=0x2625a
overscan_bottom=48
overscan_left=48
overscan_right=48
overscan_top=48
pause_burst_frames=1
program_serial_random=1
sdram_freq=450
temp_limit=85
$

$ grep . /sys/class/mmc_host/mmc0/mmc0:*/* 2>/dev/null
/sys/class/mmc_host/mmc0/mmc0:0007/cid:275048534431364730013dc549010139
/sys/class/mmc_host/mmc0/mmc0:0007/csd:400e00325b59000073a77f800a4000eb
/sys/class/mmc_host/mmc0/mmc0:0007/date:01/2016
/sys/class/mmc_host/mmc0/mmc0:0007/erase_size:512
/sys/class/mmc_host/mmc0/mmc0:0007/fwrev:0x0
/sys/class/mmc_host/mmc0/mmc0:0007/hwrev:0x3
/sys/class/mmc_host/mmc0/mmc0:0007/manfid:0x000027
/sys/class/mmc_host/mmc0/mmc0:0007/name:SD16G
/sys/class/mmc_host/mmc0/mmc0:0007/oemid:0x5048
/sys/class/mmc_host/mmc0/mmc0:0007/preferred_erase_size:4194304
/sys/class/mmc_host/mmc0/mmc0:0007/scr:0235800201000000
/sys/class/mmc_host/mmc0/mmc0:0007/serial:0x013dc549
/sys/class/mmc_host/mmc0/mmc0:0007/type:SD
/sys/class/mmc_host/mmc0/mmc0:0007/uevent:DRIVER=mmcblk
/sys/class/mmc_host/mmc0/mmc0:0007/uevent:MMC_TYPE=SD
/sys/class/mmc_host/mmc0/mmc0:0007/uevent:MMC_NAME=SD16G
/sys/class/mmc_host/mmc0/mmc0:0007/uevent:MODALIAS=mmc:block
$
pelwell commented 8 years ago

Although the failure mode looks the same this time - WP_VIOLATION - the position is completely different.

When it fails, can you run vcgencmd get_throttled? If that returns a non-zero value then you could be suffering from under-voltage (post the value - it could also be a thermal throttling). Otherwise I think we're going to have to put this down to a bad card, at least until somebody else reports the same symptoms. About the only other thing we could do is detect the initial point of failure in the driver and dump out the low-level activity log, but I don't think it would tell us anything useful.

vrabac commented 8 years ago

I just did two time stop/start from VDR services and corruption was there:

$ vcgencmd get_throttled
throttled=0x0
$ vcgencmd get_throttled
throttled=0x0
$

The Verbatim cards are new (I have two of them), i hope they are not bad cards. I'll try to contact shop where I did order them!

pelwell commented 8 years ago

It doesn't look like a power supply issue. Do both of your Verbatim cards fail the same way in the Pi3? What about in the Pi2?

vrabac commented 8 years ago

They fail with same error, but its always random, sometimes it happens when i do "systemctl reboot" , it do an reboot, but booting from mSD does then fails in the middle of booting, and only power cycle is possible. I'll try it with RPi2 and will write back. Thank you for your time.

vrabac commented 8 years ago

Test fails on RPi2, just basic stuff I did after installing archlinuxarm, did an reboot, and boot was not successful. There was corruption and needed like many times before to power cycle just in this time RPi2. I'll try to get logs, as when it fails I can just do some photos with camera but not much to see.

pelwell commented 8 years ago

You can get more output on screen by adding framebuffer_height=2160 to config.txt.

It is sounding like the cards have some fault or incompatibility with the Pi. Are you still setting the card quirks to 0x80000000?

When it does boot, what does dmesg | grep quirks show?

vrabac commented 8 years ago

I did not have had it active on RPi2, as i did "clean install" just to be on safe side. But after few reboot and stoping and starting services it failed again, and I was not able to do anything, and on TV i have had these two picture going on in loop.

fail log

Register dump:

https://abload.de/img/img_20160606_2322430tbjf.jpg
https://abload.de/img/img_20160606_232315lhxhn.jpg

After first power cycle: https://abload.de/img/img_20160606_233925a0av8.jpg

So no it even does not work with RPi2.

$ uname -a
Linux vdr02 4.4.12-1-ARCH #1 SMP Thu Jun 2 19:11:39 MDT 2016 armv7l GNU/Linux
$ dmesg | grep quirks
[    0.000000] Kernel command line: dma.dmachans=0x7f35 bcm2708_fb.fbwidth=1824 bcm2708_fb.fbheight=1200 bcm2709.boardrev=0xa01041 bcm2709.serial=0xb9b8f9ab smsc95xx.macaddr=B8:27:EB:B8:F9:AB bcm2708_fb.fbswap=1 bcm2709.uart_clock=3000000 bcm2709.disk_led_gpio=47 bcm2709.disk_led_active_low=0 vc_mem.mem_base=0x3dc00000 vc_mem.mem_size=0x3f000000  root=/dev/mmcblk0p2 rw rootwait ipv6.disable=1 console=ttyAMA0,115200 console=tty1 selinux=0 plymouth.enable=0 smsc95xx.turbo_mode=N dwc_otg.lpm_enable=0 kgdboc=ttyAMA0,115200 elevator=noop mmc_block.card_quirks=0x80000000
[    5.486332] mmcblk0: mmc0:0007 SD16G 14.5 GiB (quirks 0x80000000)
$ /opt/vc/bin/vcgencmd version
Jun  2 2016 16:01:58
Copyright (c) 2012 Broadcom
version d3cabccc990645d482e723b2c65e3b4b1481afac (clean) (release)
$
pelwell commented 8 years ago

@vrabac I really don't know when I'll get to look at this - so far you are the only person with this issue, and the last bad card took days to understand - but if you email me (phil at raspberrypi dot org) we can arrange to send you out a known-good replacement card in exchange for you sending us one of yours to analyse.

vrabac commented 8 years ago

So today i did archlinuxarm update (pacman -Suy) and these are packages which were updated:

Packages (7) 
linux-raspberrypi-4.4.13-1
pciutils-3.5.1-1
raspberrypi-firmware-20160608-1
raspberrypi-firmware-bootloader-20160608-1
raspberrypi-firmware-bootloader-x-20160608-1
raspberrypi-firmware-examples-20160608-1
raspberrypi-firmware-tools-20160608-1

So by looking the firmware commits, there was nothing new just bump to new kernel 4.4.13. But on Linux commits side there are a lot of changes, and few of them were ext4 related. After updating my RPi2 (currently for testing active). I did like 20x reboots, and 10x starting/stoping this VDR services, there were no corruptions.

@pelwell do you see anything else related what could indicate this is fixed? At least for now (2 hours of testing) everything seems fine.

$ uname -a
Linux vdr02 4.4.13-1-ARCH #1 SMP Wed Jun 8 19:31:47 MDT 2016 armv7l GNU/Linux
$

$ /opt/vc/bin/vcgencmd version
Jun  8 2016 16:08:27
Copyright (c) 2012 Broadcom
version 1c8638223821fe0ed16e20bd8e948b6cc037fa4a (clean) (release)
$

$ dmesg | grep quirks
[    0.000000] Kernel command line: dma.dmachans=0x7f35 bcm2708_fb.fbwidth=1824 bcm2708_fb.fbheight=984 bcm2709.boardrev=0xa01041 bcm2709.serial=0xb9b8f9ab smsc95xx.macaddr=B8:27:EB:B8:F9:AB bcm2708_fb.fbswap=1 bcm2709.uart_clock=3000000 bcm2709.disk_led_gpio=47 bcm2709.disk_led_active_low=0 vc_mem.mem_base=0x3dc00000 vc_mem.mem_size=0x3f000000  root=/dev/mmcblk0p2 rw rootwait ipv6.disable=1 console=ttyAMA0,115200 console=tty1 selinux=0 plymouth.enable=0 smsc95xx.turbo_mode=N dwc_otg.lpm_enable=0 kgdboc=ttyAMA0,115200 elevator=noop mmc_block.card_quirks=0x80000000
[    5.495218] mmcblk0: mmc0:0007 SD16G 14.5 GiB (quirks 0x80000000)
$

I will leave this setup so for 24 hours and then switch to RPi3.

pelwell commented 8 years ago

From what I can tell the ext4 changes are either to cope with corruption better or to placate the UBSAN code-checking tool. I haven't seen a change that would obviously prevent corruption from occurring.

Once you are happy that the system is running well you could try removing the quirks setting to see if that makes a difference.

vrabac commented 8 years ago

Seems it just worked for 10 hours on RPi2 and then again card_busy_detect So yea seems not related!

Ruffio commented 8 years ago

@vrabac what is the status on this issue?

Ruffio commented 8 years ago

@vrabac please close issue with if it seams to be resolved or provide further information. Thanks

Ruffio commented 8 years ago

@pelwell consider closing this issue