urbit / urbit

An operating function
https://urbit.org
MIT License
3.43k stars 359 forks source link

pier: serf unexpectedly shut down #5822

Closed Quodss closed 1 year ago

Quodss commented 2 years ago

Description: After the last OTA, pier crashes with "pier: serf unexpectedly shut down". When starting the pier after the crash, it would crash again after few seconds.

In these couple of seconds i had time to print +vats. Here are the logs:

...
---------------- playback complete ----------------
vere: checking version compatibility
ames: live on 52142
conn: listening on \\.\pipe\urbit-conn-dozreg-toplud
http: web interface live on https://localhost:443
http: web interface live on http://localhost:80
http: loopback live on http://localhost:12321
pier (5773968): live
> +vats
%base
  /sys/kelvin:      [%zuse 418]
  base hash:        0v1v.p3a22.iv754.lt3ot.hpbee.0elcg.orl5e.fakce.r1pb5.sg0kt.46vqk
  %cz hash:         0v10.qk6vo.bpj1l.ckup2.mq161.g50a2.echpp.dh0q9.bmuvb.gvipe.c51sm
  app status:       running
  force on:         ~
  force off:        ~
  publishing ship:  ~
  updates:          tracking
  source ship:      ~zod
  source desk:      %kids
  source aeon:      120
  pending updates:  ~
::
%inet2022
  /sys/kelvin:      [%zuse 418]
  base hash:        0v1h.1odif.kbkos.7cef9.crs82.qdmhf.adl96.26c3n.gs27j.ta6t1.slsdk
  %cz hash:         0v1h.1odif.kbkos.7cef9.crs82.qdmhf.adl96.26c3n.gs27j.ta6t1.slsdk
  app status:       running
  force on:         ~
  force off:        ~
  publishing ship:  ~
  updates:          tracking
  source ship:      ~tocrex-holpen
  source desk:      %inet2022
  source aeon:      13
  pending updates:  ~
::
%studio
  /sys/kelvin:      [%zuse 418]
  base hash:        0vq.c6i3u.2lo4u.ucc6d.vhaia.gu9ic.1htet.gssjp.sdnaa.03e4m.ct81g
  %cz hash:         0vq.c6i3u.2lo4u.ucc6d.vhaia.gu9ic.1htet.gssjp.sdnaa.03e4m.ct81g
  app status:       running
  force on:         ~
  force off:        ~
  publishing ship:  [~ ~tirrel]
  updates:          tracking
  source ship:      ~tirrel
  source desk:      %studio
  source aeon:      19
  pending updates:  ~
::
%landscape
  /sys/kelvin:      [%zuse 418]
  base hash:        0vt.1ivo6.0pj6r.qeavd.e30q1.crqu7.5qc5l.gvs0j.aulvu.joe2j.pa66j
  %cz hash:         0vt.1ivo6.0pj6r.qeavd.e30q1.crqu7.5qc5l.gvs0j.aulvu.joe2j.pa66j
  app status:       running
  force on:         ~
  force off:        ~
  publishing ship:  [~ ~lander-dister-dozzod-dozzod]
  updates:          tracking
  source ship:      ~lander-dister-dozzod-dozzod
  source desk:      %landscape
  source aeon:      26
  pending updates:  ~
::
%webterm
  /sys/kelvin:      [%zuse 418]
  base hash:        0vb.81mgj.s695n.fuiqg.ddsd4.1oec7.kai2j.aqh8c.2h3u6.ndam6.mndil
  %cz hash:         0vb.81mgj.s695n.fuiqg.ddsd4.1oec7.kai2j.aqh8c.2h3u6.ndam6.mndil
  app status:       running
  force on:         ~
  force off:        ~
  publishing ship:  [~ ~mister-dister-dozzod-dozzod]
  updates:          tracking
  source ship:      ~mister-dister-dozzod-dozzod
  source desk:      %webterm
  source aeon:      6
  pending updates:  ~
::
%garden
  /sys/kelvin:      [%zuse 418]
  base hash:        0vk.ltcfc.a5mn5.6r5st.n5373.veo4f.knnsj.rjaeu.bamr6.pbuhq.qfvch
  %cz hash:         0vk.ltcfc.a5mn5.6r5st.n5373.veo4f.knnsj.rjaeu.bamr6.pbuhq.qfvch
  app status:       running
  force on:         ~
  force off:        ~
  publishing ship:  [~ ~mister-dister-dozzod-dozzod]
  updates:          tracking
  source ship:      ~mister-dister-dozzod-dozzod
  source desk:      %garden
  source aeon:      20
  pending updates:  ~
::
%docs
  /sys/kelvin:      [%zuse 418]
  base hash:        0v1.avb2u.b8pui.8g9a5.bcicv.nv8hp.s82m0.tlt39.sluhs.q74vm.shim0
  %cz hash:         0v1.avb2u.b8pui.8g9a5.bcicv.nv8hp.s82m0.tlt39.sluhs.q74vm.shim0
  app status:       running
  force on:         ~
  force off:        ~
  publishing ship:  ~
  updates:          tracking
  source ship:      ~pocwet
  source desk:      %docs
  source aeon:      28
  pending updates:  ~
::
%pals
  /sys/kelvin:      [%zuse 418]
  base hash:        0v1i.ctage.29fp1.uksc4.sfd6k.mchgh.bq2qe.e8tk8.hd84f.cs4mi.nukp0
  %cz hash:         0v1i.ctage.29fp1.uksc4.sfd6k.mchgh.bq2qe.e8tk8.hd84f.cs4mi.nukp0
  app status:       running
  force on:         ~
  force off:        ~
  publishing ship:  ~
  updates:          tracking
  source ship:      ~paldev
  source desk:      %pals
  source aeon:      17
  pending updates:  ~
::
%escape
  /sys/kelvin:      [%zuse 418]
  base hash:        0v11.dffq9.shga9.rcdt8.kp7he.mhv6f.4gnlq.ugic3.fnjfa.1ps1a.fld50
  %cz hash:         0v11.dffq9.shga9.rcdt8.kp7he.mhv6f.4gnlq.ugic3.fnjfa.1ps1a.fld50
  app status:       running
  force on:         ~
  force off:        ~
  publishing ship:  [~ ~dister-fabnev-hinmur]
  updates:          tracking
  source ship:      ~dister-fabnev-hinmur
  source desk:      %escape
  source aeon:      68
  pending updates:  ~[[%zuse 418]]
::
%kids %cz hash:     0vf.1ngkn.b1pi8.o2n82.k3das.t1fph.3hv0q.jrhb7.t371o.7gk3k.k7ej3
ames: czar del.urbit.org: ip .142.93.228.23
ames: czar ten.urbit.org: ip .104.196.239.18
ames: czar wet.urbit.org: ip .34.121.77.1
ames: czar bus.urbit.org: ip .35.247.126.229
ames: czar feb.urbit.org: ip .34.82.25.47
ames: czar dev.urbit.org: ip .35.227.173.38
ames: czar def.urbit.org: ip .35.230.109.40
ames: czar pub.urbit.org: ip .35.230.48.78
ames: czar lur.urbit.org: ip .35.233.250.88
ames: czar zod.urbit.org: ip .35.247.119.159
ames: czar nus.urbit.org: ip .34.83.26.147
ames: czar tug.urbit.org: ip .64.225.41.162
ames: czar rel.urbit.org: ip .34.83.230.207
ames: czar rys.urbit.org: ip .23.239.12.212
ames: czar deg.urbit.org: ip .13.59.219.247
>
pier: serf unexpectedly shut down

To Reproduce Launch urbit from command line.

System (please supply the following information, if relevant):

Additional context I changed OTA source from my sponsor to ~zod prior to 1st June OTA.

I tried launching a comet and I see that %base hashes are different. Will I have to breach my planet to get proper OTA? Comet's %base hash is 0v2.r1lbp.i9jr2.hosbi.rvg16.pqe7u.i3hnp.j7k27.9jsgv.8k7rp.oi98q

Quodss commented 2 years ago

I breached the planet and booted it again with a new key; base hash is oi98q. It was running for a few minutes, then the same error reappeared.

Quodss commented 2 years ago

After an update the base hash became 46vqk; anyway, i tried running urbit with verbose flag, this is where the crash happens:

[ "|||"
  %give
  %gall
  [%unto %fact]
  i=/gall/use/azimuth/0w3.~Q7eU/out/~dozreg-toplud/eth-watcher/eth-watcher
  t=~[/dill //term/1]
]
pier: serf unexpectedly shut down
Quodss commented 2 years ago

I reproduced the same error with a fresh comet:

  1. using CLI in Windows 10 cmd, spawned a comet;
  2. wait for some time, crash happens and repeats each time I reboot the ship.

Sometimes enough time passes to update to 46vqk, but it is not necessary. Same last message with -v flag as for the planet case.

W-Glenton commented 2 years ago

I have had essentially the same error message since the update with several of my planets (all of the ones I have tried, for others I am waiting to see if there is some easy fix before doing anything too drastic). I also had a similar issue with a planet before the update, though this may well be unrelated. All of this happening on boot or during playback (with the most common error message shown during playback below):

newt: write failed end of file
pier: serf unexpectedly shut down

Edit:

For further specificity if it is useful, I'm having these issues with planets that are running on port

Further edit:

I have partially fixed this on my end, looks like it was just an issue of me trying to force a playback, doubt it fixes the actual issue raised in this thread

marcusmiguel commented 2 years ago

Same bug here:

[ "||"
  %give
  %gall
  [%unto %fact]
  i=/gall/use/eth-watcher/0w2.PVfAJ/out/~fidwed-sipwyn/spider/running/azimuth
  t=~[/dill //term/1]
]
["|||" %give %gall [%unto %fact] i=/gall/use/azimuth/0w2.PVfAJ/out/~fidwed-sipwyn/eth-watcher/eth-watcher t=~[/dill //term/1]]
pier: serf unexpectedly shut down
ericfode commented 2 years ago

same here, happens on fake ~zod as well live planets.

benjaminkwilliams commented 2 years ago

If you run |mass does your %gall section look like this?

  %gall:
      %foreign: KB/11.748
      %blocked:
        %azimuth-tracker: KB/5.616
        %face: KB/1.072
        %file-server: KB/8.208
        %goad: B/896
        'inet2022': B/424
        %orca: MB/1.164.264
        %pipe: KB/192.696
      --MB/1.373.176
      %active:

Then further in %active you see

        'inet2022': KB/273.204

Shouldn't it be %inet2022 and do you see the same thing?

Quodss commented 2 years ago

Hi @benjaminkwilliams

Here is %gall section from |mass:

%gall:
      %foreign: KB/10.568
      %blocked:
        %face: B/568
        %file-server: B/576
        %rumors: B/564
      --KB/1.708
      %active:

and I do not see inet2022 in %active

benjaminkwilliams commented 2 years ago

@Quodss I did figure out that to "unblock" what was listed in %gall, I had to install Orca, Face, and Studio. I didn't uninstall Rumor, but as yours shows as %blocked, I'm going to guess you did at some point in the past.

knoidy commented 2 years ago

At the risk of being redundant I'm adding to this thread as opposed to opening a new ticket.

Also experiencing this issue even after updating to base hash 0vu.fptbs.6f05p.c9ghb.qfh7e.sbhum.vfnnr.osfs7.vv1i1.qveva.dfvli

Running -v these are the messages I'm getting before bail:

[ "|"
  %give
  %iris
  %http-response
    i
  / gall
    use
    spider
    0wJM111
    ~sonseg-dolful
    thread
    eth-watcher--0v18r.pqi6b.pc5id.qu4oo.kh89b.3af82.8s47o.st0rj.2gkoo.9fnsp.mdv51.rhc8s.defds.vr098.4r8bs.tgec9.b7dru.7cs5r.a89tv.6itl6.cnav9
    request
  t=~[/dill //term/1]
]
[ "||"
  %give
  %gall
  [%unto %fact]
  i=/gall/use/eth-watcher/0wJM111/out/~sonseg-dolful/spider/running/azimuth
  t=~[/dill //term/1]
]
[ "|||"
  %give
  %gall
  [%unto %fact]
  i=/gall/use/azimuth/0wJM111/out/~sonseg-dolful/eth-watcher/eth-watcher
  t=~[/dill //term/1]
]
pier: serf unexpectedly shut down

@benjaminkwilliams I tried your suggestion as I had gall blocking a couple of desks including rumors. I reinstalled the blocking desks but I still get:

%gall:
      %foreign: KB/9.768
      %blocked:
        %file-server: B/576
      --B/576

I've breached several times already based on assumptions that the issue could be due to corrupt installs etc but my current is about as clean as I can imagine starting from a factory reset performed today and I'm still getting bailed with the pier: serf unexpectedly shut down error. Please help!

@MarcusMiguel did you resolve? seems we have the same problem with eth-watcher/spider

marcusmiguel commented 2 years ago

At the risk of being redundant I'm adding to this thread as opposed to opening a new ticket.

Also experiencing this issue even after updating to base hash 0vu.fptbs.6f05p.c9ghb.qfh7e.sbhum.vfnnr.osfs7.vv1i1.qveva.dfvli

Running -v these are the messages I'm getting before bail:

[ "|"
  %give
  %iris
  %http-response
    i
  / gall
    use
    spider
    0wJM111
    ~sonseg-dolful
    thread
    eth-watcher--0v18r.pqi6b.pc5id.qu4oo.kh89b.3af82.8s47o.st0rj.2gkoo.9fnsp.mdv51.rhc8s.defds.vr098.4r8bs.tgec9.b7dru.7cs5r.a89tv.6itl6.cnav9
    request
  t=~[/dill //term/1]
]
[ "||"
  %give
  %gall
  [%unto %fact]
  i=/gall/use/eth-watcher/0wJM111/out/~sonseg-dolful/spider/running/azimuth
  t=~[/dill //term/1]
]
[ "|||"
  %give
  %gall
  [%unto %fact]
  i=/gall/use/azimuth/0wJM111/out/~sonseg-dolful/eth-watcher/eth-watcher
  t=~[/dill //term/1]
]
pier: serf unexpectedly shut down

@benjaminkwilliams I tried your suggestion as I had gall blocking a couple of desks including rumors. I reinstalled the blocking desks but I still get:

%gall:
      %foreign: KB/9.768
      %blocked:
        %file-server: B/576
      --B/576

I've breached several times already based on assumptions that the issue could be due to corrupt installs etc but my current is about as clean as I can imagine starting from a factory reset performed today and I'm still getting bailed with the pier: serf unexpectedly shut down error. Please help!

@MarcusMiguel did you resolve? seems we have the same problem with eth-watcher/spider

Sorry for the late response, i'm still facing the same issue. Running the commands mentioned here seems to allow my ship to go longer without crashing but eventually it does crashes again.

joemfb commented 2 years ago

Sorry for the (much more egregiously) late response.

The original report in this thread was a ship running on Windows. Generally, vere is pretty good about printing an error message before a fatal error -- except on windows, where something is buffering (and not flushing) stderr. Is everyone else in this thread also running Windows?

marcusmiguel commented 2 years ago

Running Windows here.

knoidy commented 2 years ago

Ditto, Windows

GeneralGDA commented 2 years ago

Same on the Windows Server 2016

lecram2022 commented 2 years ago

I am experiencing the same exact issue with the same exact error messages (i.e. there seems to be some issue with "eth watcher") and have a Windows 10 computer.

I reset my network keys and installed from scratch without any success.

tapset commented 2 years ago

Running Port [app-1.9.1] on Windows 10 [Version 10.0.19044.2130] this is what I'm getting, shutdown happens almost immediately sometimes, other times it runs for 10 minutes or so:

Microsoft Windows [Version 10.0.19044.2130] (c) Microsoft Corporation. All rights reserved.

C:\Users\REDACTED>C:\Users\REDACTED\AppData\Local\port\app-1.9.1\resources\resources\w in\urbit C:\Users\REDACTED\AppData\Roaming\Port\piers\liquidation-station ~ urbit 1.10 boot: home is C:\Users\REDACTED\AppData\Roaming\Port\piers\liquidation-station loom: mapped 2048MB lite: arvo formula 11a9e7fe lite: core 38d4ad4d
lite: final state 38d4ad4d loom: mapped 2048MB boot: protected loom
live: loaded: MB/353.566.720 boot: installed 351 jets ---------------- playback starting ---------------- pier: replaying events 157132-157362 eyre: canceling ~[//http-server/0vp.ujgkv/59/3] eyre: canceling ~[//http-server/0vp.ujgkv/18/9] eyre: canceling ~[//http-server/0vp.ujgkv/74/22] [%e %authenticated-without-cookie] [%e %authenticated-without-cookie] [%e %authenticated-without-cookie] [%e %authenticated-without-cookie] [%e %authenticated-without-cookie] [%e %authenticated-without-cookie] [%e %authenticated-without-cookie] [%e %authenticated-without-cookie] [%e %authenticated-without-cookie] [%e %authenticated-without-cookie] [%e %authenticated-without-cookie] [%e %authenticated-without-cookie] [%e %authenticated-without-cookie] [%e %authenticated-without-cookie] [%e %authenticated-without-cookie] [%e %authenticated-without-cookie] [%e %authenticated-without-cookie] [%e %authenticated-without-cookie] [%e %authenticated-without-cookie] [%e %authenticated-without-cookie] [%e %authenticated-without-cookie] [%e %authenticated-without-cookie] [%e %authenticated-without-cookie] [%e %authenticated-without-cookie] [%e %authenticated-without-cookie] [%e %authenticated-without-cookie] [%e %authenticated-without-cookie] [%e %authenticated-without-cookie] [%e %authenticated-without-cookie] [%e %authenticated-without-cookie] [%e %authenticated-without-cookie] [%e %authenticated-without-cookie] pier: (157362): play: done ---------------- playback complete ---------------- vere: checking version compatibility ames: live on 52897 conn: listening on \.\pipe\urbit-conn-ritdeg-havful-hansen-miptud--widdel-lorry m-dollur-litzod eyre: canceling ~[//http-server/0v4.fa6pg/20/11] eyre: canceling ~[//http-server/0v4.fa6pg/28/2] http: web interface live on http://localhost:80 http: loopback live on http://localhost:12321 pier (157370): live ames: czar zod.urbit.org: ip .35.247.119.159 ames: czar ten.urbit.org: ip .104.196.239.18 ames: czar dys.urbit.org: ip .157.90.16.237 ames: czar pub.urbit.org: ip .35.230.48.78 ; ~haddef-sigwen is ok ; ~niblyx-malnus is ok ames: czar at ned.urbit.org: not found (b) ames: czar dev.urbit.org: ip .35.227.173.38 ames: czar feb.urbit.org: ip .34.82.25.47 ames: czar bus.urbit.org: ip .35.247.126.229 ames: czar del.urbit.org: ip .142.93.228.23 ames: czar wet.urbit.org: ip .34.121.77.1 ames: czar deg.urbit.org: ip .13.59.219.247 ames: czar rys.urbit.org: ip .23.239.12.212 ames: czar rep.urbit.org: ip .198.199.121.116 ames: czar lur.urbit.org: ip .35.233.250.88 ames: czar ref.urbit.org: ip .143.198.51.180 ames: czar nus.urbit.org: ip .34.83.26.147 ames: czar nem.urbit.org: ip .66.228.53.179 ames: czar bel.urbit.org: ip .34.69.242.152 ames: czar dem.urbit.org: ip .34.69.220.110 pier: serf unexpectedly shut down

lowgradepanic commented 2 years ago

Adding myself to the list of people experiencing this. Windows 10, using Port, but if I Start in Terminal under Manage, it looks basically the same as what tapset posted. Also the same timing variance; sometimes it crashes immediately, sometimes it takes a few minutes.

nodreb-borrus commented 2 years ago

I also have been chatting with someone on windows who is reporting this error, they just booted a new planet.

telestew commented 2 years ago

Seconded. I'm seeing the same behavior on Windows 10 with a recently migrated pier under following three circumstances

  1. Using the latest windows urbit binary in command prompt and WSL
  2. Using the latest linux urbit binary in command prompt and WSL
  3. booting from port

however my fakezod on urbit 1.8 seems stable under WSL (ubuntu 16.04)

Stahlblau4 commented 2 years ago

I'm getting the same behavior. Running my ship on a Windows laptop through Port. If I go to manage and boot it through the terminal I get the pier: serf unexpectedly shut down after a few seconds; sometimes even minutes but still totally unusable. Any ideas?

trosel commented 1 year ago

Running Windows 10, cannot keep ship running for more than 2 seconds. Same issues here.

dietofworms commented 1 year ago

Same problem here on Windows 10 running through Port.

saunic commented 1 year ago

Same problem, windows 10, CLI, tried with my planet and a fresh comet. Runs for like 1 min and then says this pier: serf unexpectedly shut down and exits.

yyuyulm commented 1 year ago

Same problem, windows 10, with both Port and CLI. Is there any update on this or a stable version to roll back to?

philipquirk commented 1 year ago

I've given planets to two people using Windows and they both have this problem nonstop.

trosel commented 1 year ago

Based on the recent commits, it looks like Urbit just won't work on Windows anymore

zalberico commented 1 year ago

Correct - we're dropping official support for native windows binaries. The linux ones likely work via WSL (Windows Subsystem for Linux) which is now generally available from Microsoft, but I haven't personally tested this: https://learn.microsoft.com/en-us/windows/wsl/install