urbit / bridge

An application for interacting with Azimuth.
MIT License
94 stars 25 forks source link

Roller is Offline #1090

Closed zalberico closed 1 year ago

zalberico commented 1 year ago

The roller we use with bridge appears to be down.

Bridge Login Failure: {"code":7979,"data":{}}

Trying to access bridge via metamask and trezor, but it fails with this error:

Screenshot 2022-12-31 at 3 42 46 PM
zalberico commented 1 year ago

Looks like this is a recurrence of https://github.com/urbit/bridge/issues/973 and https://github.com/urbit/bridge/issues/1043

zalberico commented 1 year ago
Screenshot 2022-12-31 at 3 55 16 PM
zalberico commented 1 year ago

This is unrelated to metamask, it doesn't work with master tickets either - looks like something related to the roller.

zalberico commented 1 year ago

It may need to be restarted after I updated it to 1.15? https://developers.urbit.org/reference/azimuth/l2/roller-tutorial

zalberico commented 1 year ago

@yosoyubik putting this on your radar, I took a look but we don't have much internal docs about our roller (something we should change).

I'm not sure if what I did earlier today caused this or if it was already broken.

I did the following this morning:

I did a similar workflow on every infra urbit we host. The only thing I did differently for the roller was I didn't delete the caddy file in the home directory because I wasn't sure if it was important.

zalberico commented 1 year ago

@tomholford - I know you also know a lot about bridge so tagging you on this too for context.

yosoyubik commented 1 year ago

@zalberico it should work now—I tested it running curl.

e.g. ``` curl -k --location --request POST 'https://roller.urbit.org/v1/roller' \ --header 'Content-Type: application/json' \ --data-raw '{ "jsonrpc": "2.0", "method": "getPoint", "params": { "ship": "~norsyr-torryn" }, "id": "1234" }' | jq ```

The issue was that the new binary was not allowed to run on port 80, so it was running on 8080 upon restart, so doing:

sudo setcap 'cap_net_bind_service=+ep' ./roller-dozzod-dozzod/.run

fixed it. We were running a custom binary for the roller, to prioritize HTTP requests. I can't find if there was an issue about that—probably it was just discussed internally—or a branch with the changes for the interpreter (@philipcmonk do you remember if we had those changes somewhere?).

zalberico commented 1 year ago

Thanks - is that caddyfile in the urb home directory used for anything?

yosoyubik commented 1 year ago

We don't use it anymore (I just removed it)—we use %acme now to enable SSL.

tomholford commented 1 year ago

Closing since this is now working as expected (thanks @yosoyubik !)

zalberico commented 1 year ago

What about the custom binary @tomholford? Does that matter?

tomholford commented 1 year ago

What about the custom binary @tomholford? Does that matter?

IIRC, this was a temporary bandaid to handle excessive load from the Bridge frontend. There was a bug that was DDOSing the roller. We eventually found the root cause and fixed it, so the custom binary is no longer necessary.