urbit / vere

An implementation of the Urbit runtime
MIT License
56 stars 40 forks source link

`chop` fails on a `linux-aarch64` machine (RPi4) #253

Open matthew-levan opened 1 year ago

matthew-levan commented 1 year ago

@litmus-ritten reported that chop fails on both his planet and a freshly booted fake ship (both with the same error):

urbit chop fakezod --loom 32     
loom: mapped 4096MB
boot: protected loom
live: loaded: MB/275.496.960
boot: installed 652 jets
loom: image backup complete
external/lmdb/mdb.c:5668: Assertion 'root > 1' failed in mdb_page_search()
[1]    4437 abort (core dumped)  urbit chop fakezod --loom 32
matthew-levan commented 1 year ago

@mopfel-winrux reported the same error when running urbit chop fakezod --loom 32 on his fresh fakezod on his RPi4B with 8GB of RAM.

mopfel-winrux commented 1 year ago

If I run it without --loom 32 I get the following:

$ ./urbit chop zod/
loom: mapped 2048MB
boot: protected loom
live: loaded: MB/277.282.816
boot: installed 652 jets
loom: image backup complete
lmdb: failed to open event log: Out of memory
chop: failed to initialize new database

In both cases the fakezod can be booted again

litmus-ritten commented 1 year ago

Indeed, that's why I ran it with --loom 32, but possibly a sign of something screwy if it's happening on a fakezod or other trivial pier.

Not sure if I got exactly this same error.

23 Feb 2023, 01:50 by @.***:

If I run it without > --loom 32> I get the following:

$ ./urbit chop zod/loom: mapped 2048MBboot: protected loomlive: loaded: MB/277.282.816boot: installed 652 jetsloom: image backup completelmdb: failed to open event log: Out of memorychop: failed to initialize new database

— Reply to this email directly, > view it on GitHub https://github.com/urbit/vere/issues/253#issuecomment-1440175787> , or > unsubscribe https://github.com/notifications/unsubscribe-auth/AO67ODJERRWUASEOYRR3PLLWYYRTVANCNFSM6AAAAAAVENRLRM> . You are receiving this because you were mentioned.> Message ID: > <urbit/vere/issues/253/1440175787> @> github> .> com>

matthew-levan commented 1 year ago

My first step here will be to ensure we're configuring/compiling LMDB correctly for the linux-aarch64 platform (after reading LMDB's docs on the same) and report back. Stay tuned.

matthew-levan commented 1 year ago

I don't have a linux-aarch64 at my disposal, so would either @litmus-ritten or @mopfel-winrux mind running make test from the libraries/liblmdb directory of the LMDB repository after cloning it and checking out the version we use in our build (git checkout LMDB_0.9.29) on your RPi?

mopfel-winrux commented 1 year ago

I ran the make test and below are my results liblmdb_test.txt

matthew-levan commented 1 year ago

I ran the make test and below are my results liblmdb_test.txt

Looks normal/healthy to me. Will keep digging.

mopfel-winrux commented 1 year ago

I tired this on our ARM build server and was able to properly chop.

I also got the following stacktrace from the coredump on my raspberry pi 4B+ 8GB

$ gdb urbit core 
GNU gdb (Debian 8.2.1-2+b3) 8.2.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "aarch64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
   <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from urbit...done.
[New LWP 10341]
Core was generated by `./urbit chop zod/ --loom 32'.
Program terminated with signal SIGABRT, Aborted.
#0  0x0000000000708f98 in __restore_sigs ()
(gdb) bt
#0  0x0000000000708f98 in __restore_sigs ()
#1  0x0000000000708ff0 in raise ()
#2  0x0000000000700358 in abort ()
#3  0x000000000071ed58 in mdb_assert_fail (env=0x7f91dfc570, 
    expr_txt=expr_txt@entry=0x84ec18 "root > 1", 
    func=func@entry=0x84f470 <__func__.7379> "mdb_page_search", line=line@entry=5668, 
    file=0x84eae0 "external/lmdb/mdb.c") at external/lmdb/mdb.c:1545
#4  0x00000000004e2cd8 in mdb_page_search (mc=mc@entry=0x7ffd75e380, key=key@entry=0x7ffd75e330, 
    flags=flags@entry=0) at external/lmdb/mdb.c:5658
#5  0x00000000004e3354 in mdb_cursor_set (mc=mc@entry=0x7ffd75e380, key=key@entry=0x7ffd75e330, 
    data=data@entry=0x7ffd75e340, op=op@entry=MDB_SET, exactp=exactp@entry=0x7ffd75e32c)
    at external/lmdb/mdb.c:6153
#6  0x00000000004e9888 in mdb_dbi_open (txn=0x7f91dfca20, name=name@entry=0x722930 "EVENTS", 
    flags=262152, dbi=dbi@entry=0x7ffd75e6cc) at external/lmdb/mdb.c:9815
#7  0x0000000000405334 in u3_lmdb_gulf (env_u=<optimized out>, low_d=low_d@entry=0x7ffd75e768, 
    hig_d=hig_d@entry=0x7ffd75e770) at pkg/vere/db/lmdb.c:157
#8  0x00000000004040e8 in _cw_chop (argc=argc@entry=5, argv=argv@entry=0x7ffd768b18)
    at pkg/vere/main.c:1914
#9  0x00000000004016a4 in _cw_utils (argv=0x7ffd768b18, argc=5) at pkg/vere/main.c:2266
#10 main (argc=5, argv=0x7ffd768b18) at pkg/vere/main.c:2291
(gdb) 
matthew-levan commented 1 year ago

Update here: we're seeing failures only on some RPi4s, but the same number of other users (2) have been able to chop successfully on their RPi4s. On @mopfel-winrux's linux-aarch64 server (not an RPi4), chop works successfully as well.

enriqueHAS commented 1 year ago

this worked on a fake zod

urbit chop zod --loom 32
loom: mapped 4096MB
boot: protected loom
live: loaded: MB/276.856.832
boot: installed 652 jets
loom: image backup complete
chop: event log truncation complete   
uname -m
aarch64
ubuntu@urbit:/media/urbit/piers$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.5 LTS
Release:    20.04
Codename:   focal
ubuntu@urbit:/media/urbit/piers$
matthew-levan commented 1 year ago

Thanks @enriqueHAS, which hardware is that?

enriqueHAS commented 1 year ago

Thanks @enriqueHAS, which hardware is that? rpi4 with 8G, revision 4 i believe.

MrNeutron commented 1 year ago

Hi...running a moon on a RPi3b. Chop first failed with: urbit@piv:~ $ ./ramnel-solref-rigwyd-dorlys/.run chop loom: mapped 2048MB boot: protected loom live: loaded: MB/461.733.888 boot: installed 652 jets loom: image backup complete lmdb: failed to open event log: Out of memory chop: failed to initialize new database urbit@piv:~ $ So I tried increasing the loom and got: urbit@piv:~ $ ./ramnel-solref-rigwyd-dorlys/.run chop --loom 32 loom: mapped 4096MB boot: protected loom live: loaded: MB/461.733.888 boot: installed 652 jets loom: image backup complete external/lmdb/mdb.c:5668: Assertion 'root > 1' failed in mdb_page_search() Aborted (core dumped) urbit@piv:~ $

MrNeutron commented 1 year ago

Incidentally just ran chop on another aarch64-based moon running on Oracle Cloud free tier with 6 Gb RAM and it worked fine.

321bobbyv commented 1 year ago

I'm not having luck running "./urbit chop zod/" on my zod development ship on a mac x86_64 machine. It defaults to v1.13 and does nothing. When I start the ship with zod/.run it picks up the current 1.21 version of course. Trying to chop the ship with "zod/.run chop" boots a comet. LOL.

belisarius222 commented 1 year ago

@nismut-tamwep Can you post scrollback, please? Vere v1.21 knows about the 'chop' subcommand, so I'm surprised to hear about this, and my guess is it's not related to this Raspberry Pi issue.

321bobbyv commented 1 year ago

@nismut-tamwep Can you post scrollback, please? Vere v1.21 knows about the 'chop' subcommand, so I'm surprised to hear about this, and my guess is it's not related to this Raspberry Pi issue.

Scrollback below from my macbook. (not running raspberry pi) bobvaradin@Bobs-MBP urbit % ./urbit chop zod Urbit: a personal server operating function https://urbit.org Version 1.13

Usage: ./urbit [options...] ship_name where ship_name is a @p phonetic representation of an urbit address without the leading '~', and options is some subset of the following:

-A, --arvo DIR Use dir for initial clay sync -a, --abort Abort aggressively -B, --bootstrap PILL Bootstrap from this pill -b, --http-ip IP Bind HTTP server to this IP address -C, --memo-cache-limit LIMIT Set memo cache max size; 0 means uncapped -c, --pier PIER Create a new urbit in pier/ -D, --replay Recompute from events -d, --daemon Daemon mode; implies -t -e, --ethereum URL Ethereum gateway -F, --fake SHIP Fake keys; also disables networking -G, --key-string STRING Private key string (@uw, see also -k) -g, --gc Set GC flag -I, --inject FILE Inject event from jamfile -i, --import FILE Import pier state from jamfile -J, --ivory-pill PILL Use custom ivory pill -j, --json-trace Create json trace file in .urb/put/trace -K, --kernel-stage STAGE Start at Hoon kernel version stage -k, --key-file KEYS Private key file (see also -G) -L, --local Local networking only --loom Set loom to binary exponent (31 == 2GB) -l, --lite-boot Most-minimal startup -n, --replay-to NUMBER Replay up to event -P, --profile Profiling -p, --ames-port PORT Set the ames port to bind to --http-port PORT Set the http port to bind to --https-port PORT Set the https port to bind to -q, --quiet Quiet -R, --versions Report urbit build info -r, --replay-from NUMBER Load snapshot from event -S, --skip-battery-hashes Disable battery hashing -t, --no-tty Disable terminal/tty assumptions -u, --bootstrap-url URL URL from which to download pill -v, --verbose Verbose -w, --name NAME Boot as ~name -X, --scry PATH Scry, write to file, then exit -x, --exit Exit immediately -Y, --scry-into FILE Optional name of file (for -X) -Z, --scry-format FORMAT Optional file format ('jam', or aura, for -X) --no-conn Do not run control plane --no-dock Skip binary "docking" on boot

Development Usage: To create a development ship, use a fakezod: ./urbit -F zod -A /path/to/arvo/folder -B /path/to/pill -c zod

For more information about developing on urbit, see: https://github.com/urbit/urbit/blob/master/CONTRIBUTING.md

Simple Usage: ./urbit -c to create a comet (anonymous urbit) ./urbit -w -k if you own a planet ./urbit to restart an existing urbit utilities: ./urbit cram jam state: ./urbit dock copy binary: ./urbit grab measure memory usage: ./urbit info print pier info: ./urbit meld deduplicate snapshot: ./urbit pack defragment snapshot: ./urbit prep prepare for upgrade: ./urbit next request upgrade: ./urbit queu cue state: ./urbit vere ARGS download binary:

run as a 'serf': ./urbit serf

litmus-ritten commented 1 year ago

@nismut-tamwep You likely have an old version of the Urbit binary in your working directory. Try with: https://github.com/urbit/vere/releases/tag/vere-v1.21

kennyrowe commented 1 year ago

hey guys getting the same error on my rpi4

mcevoypeter commented 1 year ago

@matthew-levan any update on this?

matthew-levan commented 1 year ago

No updates, this one is difficult to reproduce. Some RPi4s work, others don't. Not sure the root cause.

On Fri, Mar 10, 2023 at 10:11 AM Peter McEvoy @.***> wrote:

@matthew-levan https://github.com/matthew-levan any update on this?

— Reply to this email directly, view it on GitHub https://github.com/urbit/vere/issues/253#issuecomment-1463944615, or unsubscribe https://github.com/notifications/unsubscribe-auth/AV2DQRFQUYIQPJFH6RJ74H3W3NACZANCNFSM6AAAAAAVENRLRM . You are receiving this because you were mentioned.Message ID: @.***>

bazfum commented 7 months ago

I can add this happens to me, and now is happening when attempting to boot with vere 3.0. I assume the epoch system is basically the same as a chop? I'm going to move my pier to a Mac and see if I can move it forward there.

Edit: I was able to get past this on the Mac, and was up and running on 3.0. However after moving back to the Pi, it now gives:

rbit 3.0 boot: home is /home/urbit/ disk: loaded epoch 0i795477796 loom: mapped 8192MB boot: protected loom live: mapped: GB/1.378.074.624 live: loaded: KB/16.384 boot: installed 967 jets loom: external fault: 0x10 (0x200000000 : 0x400000000)

Assertion '0' failed in pkg/noun/manage.c:1791

bail: oops home: bailing out