grisp / grisp

🐟 GRiSP Erlang Runtime Library
https://www.grisp.org
Apache License 2.0
238 stars 38 forks source link

Boot failing when running tutorial code from the Wiki #108

Open GwendalLaurent opened 2 years ago

GwendalLaurent commented 2 years ago

Versions

Description

Hello, I'm trying to run the code from the wiki tutorial on my GRISP2 board, but the Erlang VM is crashing during the boot. The code is the same as in the tutorial, and I followed the previous steps without many issues.

The board output is a bit weird, as you can see below, it is filled with random letter and numbers.

en1.1: <Freescale EHCI root HUB> at usbus1
media listener: event = MOUNT, state = SUCCESS, src = /dev/mmcsd-1-0u[ERL] SD card mounted
h, dest = /media/mmcsd-1-0ub0: <Freescale EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus0

u[ERL] GRiSP mountpoint from the FDT: /media/mmcsd-1-0/
hub1: <Freescale EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1
[ERL] Reading /media/mmcsd-1-0/grisp.ini[ERL] Booting with arg: erl.rtems -C multi_time_warp -- -mode embedded -home . -pa . -root robot -bindir robot/erts-11.2.2.10/bin -boot robot/releases/0.1.0/start -config robot/releases/0.1.0/sys.config
[ERL] hostname: grisp-001023
[ERL] Starting DHCP
[ERL] mkdir /tmp
info: ffec0: link state changed to DOWN
[ERL] mkdir /tmp/log
[ERL] mkdir /home
[ERL] Setting environment
[ERL] chdir(/media/mmcsd-1-0/)
[ERL] getcwd: /media/mmcsd-1-0
[ERL] Starting BEAM
{"init terminating in do_boot",{badarg,[{erlang,binary_to_term,[<<131,104,3,100,0(,n6o, 1l1o5g,g9e9r, 1p1r4e,s1e0n5t,)1 1u2n,e1x1p6e,c1t0e4d, 2l,o1g0g7e,r0 ,m5e,s1s1a4g,e1:1 1{,l9o8g,,1e1r1r,o1r1,6",E1r0r7o,r0 ,i5n, 4p8r,o4c6e,s4s9 ,~4p6 ,w4i8t,h1 0e8x,i0t, 0v,a0l,u3e4:,~1n0~4p,~2n,"1,0[0<,00.,99.,01>1,2{,b1a1d4a,r1g0,1[,{7e6r,l1a1n1g,,9b7i,n1a0r0y,_1t0o1_,t1e0r0m,,1[0<8<,103,10,,100,42,13,,110000,,00,,76,,9171,51,1969,,111114,,110095,,110152,,9191,61,1150,41,020,,100,78,,09,95,,111114,,111171,,19180,,111116,,111061,,110174,,01,155,,4180,04,60,,489,,14061,,4181,41,0180,80,,905,,01,0354,,111004,,120,51,0101,60,,190,01,102,,1151,41,0110,11,1746,,110181,,9957,,111020,,111041,,110050,,110098,,905,,01,008,,2111,11,0907,,01,070,,9170,11,1161,41,1110,01,009,,1100,51,0919,,111145,,110080,,905,,81,1969,,111141,,9171,79,91,1100,11,1161,41,0110,01,104,,61,1150,11,0101,40,,180,81,0917,,111140,,110083,,9150,01,005,,1161,01,0110,51,1141,61,1160,01,105,,

However, a crash report is still written and readable:

=erl_crash_dump:0.5
Fri Jan  1 00:00:07 1988
Slogan: init terminating in do_boot ({badarg,[{erlang,binary_to_term,[],[]},{init,get_boot,1,[]},{init,get_boot,2,[]},{init,do_boot,3,[]}]})
System version: Erlang/OTP 23 [erts-11.2.2.10] [source] [smp:1:1] [ds:1:1:10] [async-threads:1]
Compiled: Mon Jul 18 11:13:02 2022
Taints: 
Atoms: 2587
Calling Thread: scheduler:1
=scheduler:1
Scheduler Sleep Info Flags: 
Scheduler Sleep Info Aux Work: DELAYED_AW_WAKEUP | THR_PRGR_LATER_OP | MISC_THR_PRGR

I've already tried 2 SD cards, a Philips and a SanDisk, and they were both FAT32 formatted. The board is working correctly without a SD card, I can execute commands using the Erlang shell.

Does someone know what is the issue here and how to solve it ?

Thank you

peerst commented 2 years ago

Can you tell how exactly you formatted the SD cards, like which comandline, please? Might help reproduce the problem, maybe its some subtle filesystem incompatibility where our FAT implementation doesn't support the exact formatting?

Another thing to try: Use a failing SD card, mount it on your computer again, read the respective boot file and try to decode it exactly the same way its done when the boot file gets decoded and see if it works there. I'll follow up with more details.

Third thing: Read the whole raw SD card out with dd and send it to us and we try to reproduce

eproxus commented 2 years ago

@GwendalLaurent A first experiment would be to try to read the file robot/releases/0.1.0/start.boot from the SD card using Erlang:

1> {ok, Boot} = file:read_file("/path/to/SD/robot/releases/0.1.0/start.boot").
{ok,<<131,104,...>>}
2> binary_to_term(Boot).
{script,{"robot","0.1.0"},
        [{preLoaded,[atomics,counters,erl_init,erl_prim_loader,
                     erl_tracer,erlang,erts_code_purger,
                     erts_dirty_process_signal_handler,erts_internal,
                     erts_literal_area_collector,init,persistent_term,
                     prim_buffer,prim_eval,prim_file,prim_inet,prim_net,
                     prim_socket,prim_zip,socket_registry,zlib]},
         {...}|...]}

If that fails, it means the boot file is corrupted for some reason.

GwendalLaurent commented 2 years ago

I tried different techniques to format the SD card:

The Erlang commands given by @eproxus worked, and the binary_to_term function gave me the same output.

In some cases, when I perform the same operations, I have another error:

[ERL] Mounting SD card asynchronously
[ERL] Lowering self priority
[ERL] Initializing libbsd
nexus0: <RTEMS Nexus device>
ofwbus0: <Open Firmware Device Tree> on nexus0
simplebus0: <Flattened device tree simple bus> on ofwbus0
simplebus1: <Flattened device tree simple bus> mem 0x2000000-0x20fffff on simplebus0
simplebus2: <Flattened device tree simple bus> mem 0x2000000-0x203ffff on simplebus1
simplebus3: <Flattened device tree simple bus> mem 0x2100000-0x21fffff on simplebus0
regfix0: <Fixed Regulator> on ofwbus0
regfix1: <Fixed Regulator> on ofwbus0
ccm0: <Freescale i.MX6 Clock Control Module> mem 0x20c4000-0x20c7fff irq 119,120 on simplebus1
gpio0: <Freescale i.MX GPIO Controller (RTEMS)> mem 0x209c000-0x209ffff irq 98,99 on simplebus1
gpiobus0: <GPIO bus> on gpio0
gpio1: <Freescale i.MX GPIO Controller (RTEMS)> mem 0x20a0000-0x20a3fff irq 100,101 on simplebus1
gpiobus1: <GPIO bus> on gpio1
gpio2: <Freescale i.MX GPIO Controller (RTEMS)> mem 0x20a4000-0x20a7fff irq 102,103 on simplebus1
gpiobus2: <GPIO bus> on gpio2
gpio3: <Freescale i.MX GPIO Controller (RTEMS)> mem 0x20a8000-0x20abfff irq 104,105 on simplebus1
gpiobus3: <GPIO bus> on gpio3
gpio4: <Freescale i.MX GPIO Controller (RTEMS)> mem 0x20ac000-0x20affff irq 106,107 on simplebus1
gpiobus4: <GPIO bus> on gpio4
usbphy0: <Freescale i.MX6 USB PHY> mem 0x20c9000-0x20c9fff irq 76 on simplebus1
usbphy1: <Freescale i.MX6 USB PHY> mem 0x20ca000-0x20cafff irq 77 on simplebus1
imx_usbmisc0: <i.MX USB Misc Control> mem 0x2184800-0x21849ff on simplebus3
ehci0: <Freescale i.MX integrated USB controller> mem 0x2184000-0x21841ff irq 75 on simplebus3
usbus0: EHCI version 1.0
usbus0: stop timeout
usbus0 on ehci0
ehci1: <Freescale i.MX integrated USB controller> mem 0x2184200-0x21843ff irq 74 on simplebus3
usbus1: EHCI version 1.0
usbus1 on ehci1
ffec0: <Freescale Gigabit Ethernet Controller> mem 0x2188000-0x218bfff irq 150,151 on simplebus3
miibus0: <MII bus> on ffec0
ksz8091rnb0: <Microchip KSZ8091RNB with 50MHz clock> PHY 1 on miibus0
ksz8091rnb0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
info: ffec0: Ethernet address: 50:2d:f4:23:a1:99
sdhci_fsl0: <Freescale uSDHC controller> mem 0x2194000-0x2197fff irq 55 on simplebus3
mmc0: <MMC/SD bus> on sdhci_fsl0
sdhci_fsl1: <Freescale uSDHC controller> mem 0x2190000-0x2193fff irq 54 on simplebus3
mmc1: <MMC/SD bus> on sdhci_fsl1
media listener: event = DISK ATTACH, state = INQUIRY, src = mmcsd
media listener: event = DISK ATTACH, state = SUCCESS, src = mmcsd, dest = /dev/mmcsd-0
mmedia listener: event = MOUNT, state = INQUIRY, src = /dev/mmcsd-0
mcsd0: 8GB <MMCHC Q2J55L 1.0 SN 10DDE71A MFG 08/2020 by 19 0x004e> at mmc0 45.0MHz/8bit/65535-block
media listener: event = MOUNT, state = FAILED, src = /dev/mmcsd-0
mmedia listener: event = PARTITION INQUIRY, state = INQUIRY, src = /dev/mmcsd-0
mmedia listener: event = PARTITION INQUIRY, state = SUCCESS, src = /dev/mmcsd-0
cmedia listener: event = PARTITION ATTACH, state = INQUIRY, src = /dev/mmcsd-0
smedia listener: event = PARTITION ATTACH, state = SUCCESS, src = /dev/mmcsd-0, dest = /dev/mmcsd-0-0
dmedia listener: event = MOUNT, state = INQUIRY, src = /dev/mmcsd-0-0
0boot: Additional partition. This is currently not supported in RTEMS.mmcsd0boot0: 17MB partition 1 at mmcsd0
media listener: event = MOUNT, state = SUCCESS, src = /dev/mmcsd-0-0, dest = /media/mmcsd-0-0
mmedia listener: event = PARTITION ATTACH, state = INQUIRY, src = /dev/mmcsd-0
mmedia listener: event = PARTITION ATTACH, state = SUCCESS, src = /dev/mmcsd-0, dest = /dev/mmcsd-0-1
cmedia listener: event = MOUNT, state = INQUIRY, src = /dev/mmcsd-0-1
sd0boot: Additional partition. This is currently not supported in RTEMS.mmcsd0boot1: 17MB partition 2 at mmcsd0
media listener: event = MOUNT, state = FAILED, src = /dev/mmcsd-0-1
mmcsd0rpmb: 4MB partition 3 at mmcsd0
media listener: event = DISK ATTACH, state = INQUIRY, src = mmcsd
media listener: event = DISK ATTACH, state = SUCCESS, src = mmcsd, dest = /dev/mmcsd-1
mmedia listener: event = MOUNT, state = INQUIRY, src = /dev/mmcsd-1
mcsd1: 32GB <SDHC SD32G 8.5 SN FD45C362 MFG 05/2022 by 3 SD> at mmc1 45.0MHz/4bit/65535-block
media listener: event = MOUNT, state = FAILED, src = /dev/mmcsd-1
usbus0: usbus1: 480Mbps High Speed USB v2.0
media listener: event = PARTITION INQUIRY, state = INQUIRY, src = /dev/mmcsd-1480Mbps High Speed USB v2.0

media listener: event = PARTITION INQUIRY, state = SUCCESS, src = /dev/mmcsd-1
[media listener: event = PARTITION ATTACH, state = INQUIRY, src = /dev/mmcsd-1
zone: unpcb] kern.ipc.maxsockets limit reached
media listener: event = PARTITION ATTACH, state = SUCCESS, src = /dev/mmcsd-1ugen0.1: <Freescale EHCI root HUB> at usbus0
, dest = /dev/mmcsd-1-0i[ERL] Running ifconfig on lo0
n
fo: lo0: link state changed to UP
media listener: event = MOUNT, state = INQUIRY, src = /dev/mmcsd-1-0uwaiting for SD...

g
en1.1: <Freescale EHCI root HUB> at usbus1
media listener: event = MOUNT, state = SUCCESS, src = /dev/mmcsd-1-0u[ERL] SD card mounted
h, dest = /media/mmcsd-1-0ub0: <Freescale EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus0

u[ERL] GRiSP mountpoint from the FDT: /media/mmcsd-1-0/
hub1: <Freescale EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1
[ERL] Reading /media/mmcsd-1-0/grisp.ini[ERL] Booting with arg: erl.rtems -C multi_time_warp -- -mode embedded -home . -pa . -root robot -bindir robot/erts-11.2.2.10/bin -boot robot/releases/0.1.0/start -config robot/releases/0.1.0/sys.config
[ERL] hostname: gri

fatal extension: source=9, is_internal=0, error=-2137442768

R0   = 0x6c72655b R8  = 0x00000014
R1   = 0x80c2dbe0 R9  = 0x80982168
R2   = 0x1eace420 R10 = 0x00000001
R3   = 0x5d676e61 R11 = 0x1eadd420
R4   = 0x9f6fc000 R12 = 0x00000000
R5   = 0x80c1dbd8 SP  = 0x80a13a28
R6   = 0x00000008 LR  = 0x80594493
R7   = 0x9f6fc000 PC  = 0x805944ec
CPSR = 0x80000173 VEC = 0x00000004
FPEXC = 0x40000000
FPSCR = 0x00000000
D00 = 0xcfcfcfcfcfcfcfcf
D01 = 0xcfcfcfcfcfcfcfcf
D02 = 0x0000000000000000
D03 = 0x0000000000000000
D04 = 0x0000000000000000
D05 = 0x0000000000000000
D06 = 0x8040201008040201
D07 = 0x8040201008040201
D08 = 0x0000000000000000
D09 = 0x0000000000000000
D10 = 0x0000000000000000
D11 = 0x0000000000000000
D12 = 0x0000000000000000
D13 = 0x0000000000000000
D14 = 0x0000000000000000
D15 = 0x0000000000000000
D16 = 0x0000000000000000
D17 = 0x0000000000000000
D18 = 0x88d007949d163a0c
D19 = 0x4bde41d6304a008a
D20 = 0x688084c52a1514f6
D21 = 0x9a6108251c0ecac8
D22 = 0x0c57ca40e4ae468c
D23 = 0x6587380206a4b123
D24 = 0x8214882a88d52001
D25 = 0xa92948ef40bc7801
D26 = 0x44472183edcb08c0
D27 = 0x22a02630b7128098
D28 = 0x4255830b02403041
D29 = 0x221136481a68cd80
D30 = 0x2a2003690a25cc23
D31 = 0x3e1676d92e208320

I think it could be related to the error I already described before because sometimes the board prints both of them (i.e. First the Erlang VM crashes, and then I have an output similar to this one but with the same error number)

eproxus commented 2 years ago

So next step would be to make an image of the SD card and send it to us. That way we can verify if the image is corrupt somehow, or if you have a potential hardware problem with the board itself.

To make an image of the SD card you can use the following command (presuming macOS and that you have pv installed and want to see progress):

$  sudo cat /dev/path/to/device | pv -s 16g | xz >sdcard.image.xz
 430MiB 0:01:05 [6.96MiB/s] [>                                 ]  2% ETA 0:40:10

The size of the SD card probably has to be manually set with the -s flag if it can't be detected automatically. Depending on the size of the SD card it can take quite a while. If you don't want to use pv just using cat is enough:

$ cat /dev/path/to/device | xz >sdcard.image.xz
[no output until it's done!]

If you don't have or can't use xz you can use gzip instead, replace the last pipe segment with gzip >scared.image.gz.

If the file is too large to be uploaded anywhere, please email grisp AT grisp DOT org and we'll send you an link to an upload service.

juhlig commented 1 year ago

Hey :) Is there any solution to this yet? Because I have the exact same issue :sweat_smile:

juhlig commented 1 year ago

Been fooling around with this all day, to no avail. Tried with a SanDisk SDHC as well as with a old noname one I had lying around, must have formatted them over and over like a hundred times, tried different OTP versions, different versions of rebar3... But it is always the same, just like what @GwendalLaurent already described. So I'm creating an image of the card now and finding a place to upload the thing.

peerst commented 1 year ago

Hmm, seems we have a regression here. Β I’ll look at it on Monday when I’m back in the office.Β Am 03.12.2022 um 00:05 schrieb Jan Uhlig @.***>:ο»Ώ Been fooling around with this all day, to no avail. Tried with a SanDisk SDHC as well as with a old noname one I had lying around, must have formatted them over and over like a hundred times, tried different OTP versions, different versions of rebar3... But it is always the same, just like what @GwendalLaurent already described. So I'm creating an image of the card now and finding a place to upload the thing.

β€”Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

juhlig commented 1 year ago

I posted a link to the image to @eproxus in Slack.

juhlig commented 1 year ago

As per @eproxus' request FYI: the image I gave to him was created on linux mint 20.3 (una), which is based on ubuntu 20.04 (focal), via sudo cat /dev/mmcblk0 | pv -s 32g | xz >sdcard.image.xz.

juhlig commented 1 year ago

Trying further, I found that the board behaves differently when I insert one or the other card into an already running board.

With the new 32GB SanDisk, I get a something like a mount message (truncated, though), then after a few seconds the board crashes and needs to be reset.

Doing the same with the old 512MB noname one, the board does not crash. Anyway, one of two things happens:

eproxus commented 1 year ago

@juhlig Small caveat: I'm not sure how well-supported re-mounting is in RTEMS (if at all)

However, if some files are readable but not others (given that they all are readable on a computer), it looks more and more like a FAT driver issue...

eproxus commented 1 year ago

@ziopio Can you take a look?

juhlig commented 1 year ago

given that they all are readable on a computer

They are all readable on my computer, no problems at all.

ziopio commented 1 year ago

Testing on Majaro 22, erlang 24.3.4.6 otp 3.18.0

I can reproduce the second crash from @GwendalLaurent

eproxus commented 1 year ago

@ziopio Can you test a broken SD card image with the RTEMS shell directly and check if you can read the broken files or not?

ziopio commented 1 year ago

The problem consists in how the fat32 formatting is executed and how rtems is sensible to fat32 formatting.

Usually we all develop on Macs. So i removed the SD card that was crashing and i put it in my MAC. I erased, as i always do, with the (MS-DOS) Fat32 option from 'Disk Utility' tool. Then inserted the SD in my Majaro laptop and deployed some code on it. The SD boots on GRISP just fine, but if i use Gparted again and deploy again the same crash occurs.

eproxus commented 1 year ago

So for an upstream bug report it would be nice to be able to shrink to a minimal test case using an empty FAT partition with the minimal structure of directories and files that exposes the issue. Not sure if it is worth it or if we can just supply the original image directly...

ziopio commented 1 year ago

Update: Last week I tested on a Sandisk Ultra 64 GB SDXC C10 U1 A1 This morning i tried an old Kingston 8GB SDHC C4

In Linux (Majaro) i used Gparted, on Mac i used Disk Utility.

      Linux macOS 
     β”Œβ”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”
SDHC β”‚ OK  β”‚ OK  β”‚
     β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€
SDXC β”‚  X  β”‚ OK  β”‚
     β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜

On the SDHC i cannot reproduce the problem.

ziopio commented 1 year ago

@juhlig @GwendalLaurent TL;DR

Try this workaround:
    use Gparted,
    remove all partitions,
    format a new 8GB partition (precisely 8192 MiB),
    deploy files there.
Optional:
    If it works, try extending the partition to the whole available space on your SD card.

What i did: I spent some time testing with the 64GB SDXC i have here using Gparted. Although I am still far from discovering the cause of the problem... I would ask you to try 2 things that worked for me.

This SD card was always crashing until i tried to format just the first 8GB of it. I got 0 crashes on 10 attempt, Then i also tried to extend the 8GB partition to the whole 59.48GB i had available. All works on multiple attempts. But as soon as you format a partition greater the 8GB Gparted does something which RTEMS does not like. I'll keep testing.

GwendalLaurent commented 1 year ago

The workaround worked for me. Using my SanDisk 32GB SDHC, I was able to run the basic robot application on the GRiSP both with a 8GB partition and then by appending that partition to the max size (29.73GiB in my case).

peerst commented 1 year ago

The workaround worked for me. Using my SanDisk 32GB SDHC, I was able to run the basic robot application on the GRiSP both with a 8GB partition and then by appending that partition to the max size (29.73GiB in my case).

And this card didn't work when partitioned/formatted differently?

Trying to get data points on if it's SDHC vs. SDXC or just partition size

juhlig commented 1 year ago

The workaround worked for me, too.

@peerst looks like it is not related to SDHC vs SDXC. @ziopio used a SDXC, @GwendalLaurent and me used a SDHC. The fact that @ziopio's SDHC worked seems to be circumstantial, since his card just happened to be a 8GB one.

ziopio commented 1 year ago

Problem has to do with FAT32 data-structures alignments. Gparted uses this to format the partition: mkfs.fat -F32 -v -I /dev/your-sd-partition I tried to manually execute this with different flags and seams that disabling alignment (-a) makes the problem disappear. So... @juhlig @GwendalLaurent You can still use Gparted to create partitions but to format big partitions( > 8GB) you should do:

With this I can reliably boot. The resulting partition has 32 reserved sectors as i had on the Mac. This alignement might be an edge case that triggers a bug in the RTEMS FAT/SD driver. We'll keep investigating the issue.

peerst commented 1 year ago

This alignement might be an edge case that triggers a bug in the RTEMS FAT/SD driver.

This is weird. Most of the time mis-alignment makes things not work anymore but, well.

GwendalLaurent commented 1 year ago

And this card didn't work when partitioned/formatted differently?

Trying to get data points on if it's SDHC vs. SDXC or just partition size Yes, it was the same card I used to document the crash for this issue.

@ziopio Thank you very much for the workaround.

Should I keep the issue open since the bug in itself isn't fixed ?

peerst commented 1 year ago

Should I keep the issue open since the bug in itself isn't fixed ?

πŸ‘

ziopio commented 1 year ago

I updated the FAQ in the Wiky with the workaround and a link to this issue. The answer is also linked in the tutorial page, so we hopefully save new users from tumbling into this.

d-led commented 10 months ago

not sure if it's off-topic but if someone is searching for it:

short feedback on formatting on Macs: tried normal partition first, which did not load successfully, however, choosing MBR worked:

format_for_grisp