urbit / urbit

An operating function
https://urbit.org
MIT License
3.43k stars 358 forks source link

L2 Planet can communicate with some galaxies but no other points #6164

Open thelifeandtimes opened 1 year ago

thelifeandtimes commented 1 year ago

Describe the bug I recently booted an L2 planet and it can communicate (|hi) a few known galaxies like ~zod and ~deg, but not some others (including it's upstream sponsor, ~tyr), but it cannot reach any stars or planets, and it can't reach moons of the galaxies that are known to be running (i.e. ~lander-dister-dozzod-dozzod).

To Reproduce Steps to reproduce the behaviour:

  1. Claim L2 planet, be eager and experience %key-mismatch error when trying to boot, delete failed booted pier, be patient and then successfully boot
  2. See some successful boot printouts alongside the occasional ames: czar at XXX.urbit.org: not found (b)
  3. try to connect with other points, only be able to connect to a few galaxies.
  4. Check +azimuth-block height. Be a little behind, so run -azimuth-load
  5. Remember that you learned to be patient, so wait ~1 hour and recheck the block height, confirming that it is in line with current eth confirmed block height.
  6. Try to connect with other points again, still be only able to connect to a few galaxies.

Expected behaviour I expect that once my azimuth block height is up to date that i am able to connect with points that are known to be kive on the network.

System (please supply the following information, if relevant):

Additional context Maybe this is related to this issue: https://github.com/urbit/urbit/issues/5444

Notify maintainers Native Planet guys @nallux-dozryl and @yapishu might be interested in this issue, particularly if it has something to do with the firewalling woes mentioend in #5444

zalberico commented 1 year ago

Are you running this locally?

Can you try specifying a port? ./<pier>/.run -p 32123

Thanks for the clear write up, I think what you're saying makes sense and you should able to communicate with more than just a handful of galaxies on the network. Any more details about your setup (local, hosted, proxies) would be helpful to know.

We can also iterate via support@urbit.org where this is less likely to get dropped.

zalberico commented 1 year ago

@baudtack tagging you in this for help

thelifeandtimes commented 1 year ago

@nallux-dozryl / @yapishu do the groundseg docker containers already specify a port when booting? If so, I can try pulling the pier and running it elsewhere instead?

yapishu commented 1 year ago

@thelifeandtimes Yes, groundseg automatically manages the port, but you can export your pier through the GUI and run it elsewhere however you'd like

yapishu commented 1 year ago

@zalberico The groundseg->anchor setup involves running a modified Tlon urbit container, attaching it to a wireguard container network, connecting that to a WG server, and forwarding the ames port from the server via the tunnel. I don't think this could be the cause of this issue, since whether it works or not ought to be binary, not contingent on the ship it's connecting to

felzix commented 1 year ago

I think I'm seeing the same issue. I can communicate with some ships but not others.

Like OP, |hi ~tyr does not return successful but |hi ~zod and |hi ~deg do. The one that's really messing with me is being unable communicate with ~middev because I want to find the forge group.

I am able to communicate with my friend's planet as well as ~paldev.

Info:

Tests:


Edit: I can now communicate with ~middev. Still can't ping ~tyr. I didn't do anything but wait a day, so... maybe ~middev was actually unavailable? Or maybe routing is wonky only sometimes?

zalberico commented 1 year ago

@felzix ~middev was down recently so that's probably what you saw.

felzix commented 1 year ago

@felzix ~middev was down recently so that's probably what you saw.

I bet you're right.

thelifeandtimes commented 1 year ago

Adding into this issue as I am having the same issue with a newly booted L1 planet, ~barfen-matruc, with a twist or two.

Twist #1: this is running locally on an m2 macbook on vere 1.22

Twist #2: This is after a few failed attempts at booting with vere 1.17 where the first few lines of the boot sequence would occur and then crash and clear the terminal. Not pier was created but it is unclear if any communications went out to the network?

I can |hi a few galaxies (~zod, ~deg, ~nut) and get "is neighbor" / "is ok" responses after it finds their perm DNS. If I attempt this to a planet or a star i don't get any success, and some galaxies that I know to be operating are not able to be reached (~wex, which is upstream of my star sponsor).

My +azimuth-block is sufficiently up to date (16.842.931 as of this writing), and while initially my sponsor was not fully up to date (~tocwex), I ran -azimuth-load overnight and with the azimuth block height being past the spawning of this planet I am still unable to discover a route.

Now here is the big Twist #3: I have successfully sent a DM in Landscape Talk from ~barfen-matruc to ~sarlev-sarsen, but messages will not go the other way. similarly, I was able to |hi ~sarlev-sarsen and see the message in ~sarlev's dojo, but never got a ~sarlev-sarsen is ok/your neighbor print in ~barfen's dojo.

ajlamarc commented 1 year ago

Booted a fresh planet to zuse 414 and vere 2.1, and I'm seeing the same problem. I can |hi ~zod and a few other galaxies, but otherwise nothing.

vvisigoth commented 1 year ago

@yosoyubik is there a way to tell if this is "normal" L2 key propagation or something more sinister?

hanfel-dovned commented 1 year ago

Using the L1 planet ~diswyn-sibdec (spawned yesterday from ~tocwex), I'm running into this same issue: I can |hi a few galaxies, but no other ships. (It's worth noting here that ~wex might be behind on OTAs.)

yosoyubik commented 1 year ago

The issue here seems to be a galaxy behind on OTAs—probably hitting the "hex number with leading zero digits" and not getting notified of breaches/new keys. The latest hotfix to master has probably helped some, but some other galaxies, on older urbit-os versions could be hitting a similar one (we are looking into it)