urbit / vere

An implementation of the Urbit runtime
MIT License
59 stars 39 forks source link

Replay gets stuck on some ships #717

Open dosullivan opened 2 months ago

dosullivan commented 2 months ago

I've seen a handful of ships that get ship on replay when starting up, and then never catch up. Their log output looks like this:

> urbit sampel-palnet
~
urbit 3.1
boot: home is /tmp/sampel-palnet
disk: loaded epoch 0i3624838
loom: mapped 2048MB
boot: protected loom
live: mapped: MB/669.286.400
live: loaded: KB/16.384
boot: installed 972 jets
---------------- playback starting ----------------
play: events 4295664-4295665

It will stay there for hours and never finish.

pkova commented 2 months ago

Could you test replaying these ships with vere 3.0 just so we see whether that makes a difference. Replay was changed to happen in a subprocess in 3.1.

dosullivan commented 2 months ago

It's the same on vere 3.0. The cpu is stuck at 100% when this happens, and it remains stuck on that particular event.

dosullivan commented 2 months ago

Here's the info output:

loom: mapped 2048MB
boot: protected loom
live: mapped: MB/669.286.400
live: loaded: KB/16.384
boot: installed 972 jets
disk: loaded epoch 0i3624838

urbit: sigsed-pasfus at event 4295663
  disk: live=&, event=4295665

epocs:
  0i3161006
  0i3624838

lmdb info:
  map size: 1099511627776
  page size: 4096
  max pages: 268435456
  number of pages used: 39090
  last transaction ID: 669867
  max readers: 126
  number of readers used: 0
  file size (page): 160112640
  file size (stat): 160112640

It's like there's one problematic event. If I replay up to the event before, it's fine, but if I try to just play to that event itself, it hangs:

urbit play -n 4295663 sampel-palnet
disk: loaded epoch 0i3624838
loom: mapped 2048MB
boot: protected loom
live: mapped: MB/669.286.400
live: loaded: KB/16.384
boot: installed 972 jets
mars: already computed 4295663
      state=4295663, log=4295665
disk: snapshot (event 4295663) is out of date
      (latest event is 4295665
start/shutdown your pier gracefully first

@ # urbit play -n 4295664 sampel-palnet
disk: loaded epoch 0i3624838
loom: mapped 2048MB
boot: protected loom
live: mapped: MB/669.286.400
live: loaded: KB/16.384
boot: installed 972 jets
---------------- playback starting ----------------
play: event 4295665