Open jakubgs opened 2 years ago
Can concur, experienced two non descriptive POST Response: 403
errors
Indeed, the 403
error can be caused by at least two issues:
--http.vhosts
in Geth configuration.Currently there is no distinction between those two in the logs. I don't know if Geth actually responds with a payload that indicates which one it is, since I haven't seen it, but if it does, then we should print it as well.
On the JWT token I had done echo "tokendata" > /tmp/jwttoken
, introducing a line break, it might be useful to strip whitespace from the read string?
contains this functionality. It works in my testing. It selects the first line, i.e. up to the line break, but does not include it, which I believe is what you're looking for here.
Some tests with a slightly instrumented version of Nimbus which prints the raw JWT secret read to the console/logs (secret len here is first the hex-encoded, i.e. it has to be 64 or 66 with 0x, and second is decoded, i.e. 32 bytes):
$ echo 0123456789abcdef0123456789abcdef0123456789abcdef0123456789 > /tmp/jwt
$ build/nimbus_beacon_node --jwt-secret=/tmp/jwt
...
JWT secret: 0123456789abcdef0123456789abcdef0123456789abcdef0123456789
JWT secret len: 58
FAT 2022-09-16 21:56:00.772+02:00 Specified a JWT secret file which couldn't be loaded err="JWT secret not 256 bits"
This is correct. The JWT secret needs to be 32 (in-memory) bytes per spec.
$ echo 0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef > /tmp/jwt
$ build/nimbus_beacon_node --jwt-secret=/tmp/jwt
...
JWT secret: 0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef
JWT secret len: 64
secret.len == SECRET_LEN
...
this file has a newline.
$ echo -n 0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef > /tmp/jwt
$ build/nimbus_beacon_node --jwt-secret=/tmp/jwt
...
JWT secret: 0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef
JWT secret len: 64
secret.len == SECRET_LEN
...
this file does not have a newline. Nimbus reads either identically.
instrumentation:
diff --git a/beacon_chain/spec/engine_authentication.nim b/beacon_chain/spec/engine_authentication.nim
index 5e7720d4..ca940986 100644
--- a/beacon_chain/spec/engine_authentication.nim
+++ b/beacon_chain/spec/engine_authentication.nim
@@ -90,8 +90,11 @@ proc checkJwtSecret*(
if lines.len > 0:
# Secret JWT key is parsed in constant time using nimcrypto:
# https://github.com/cheatfate/nimcrypto/pull/44
+ debugEcho "JWT secret: ", lines[0]
+ debugEcho "JWT secret len: ", len(lines[0])
let secret = utils.fromHex(lines[0])
if secret.len == SECRET_LEN:
+ debugEcho "secret.len == SECRET_LEN"
ok(secret)
else:
err("JWT secret not 256 bits")
Other tests:
$ echo -n 0x0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef > /tmp/jwt
...
JWT secret: 0x0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef
JWT secret len: 66
secret.len == SECRET_LEN
which works correctly (no newline and with 0x
)
and
$ echo 0x0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef > /tmp/jwt
...
JWT secret: 0x0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef
JWT secret len: 66
secret.len == SECRET_LEN
(newline and 0x
)
which also works.
Thanks for checking that. That means the 403 we were getting was actually from the incorrect vhost
setting from the start.
I have checked and the payload we get from Geth when vhost
is incorrect does not indicate the issue:
admin@linux-01.he-eu-hel1.nimbus.prater:/docker/geth-goerli-01 % c -v http://linux-01.he-eu-hel1.nimbus.prater.wg:8546
* Trying 10.14.0.69:8546...
* Connected to linux-01.he-eu-hel1.nimbus.prater.wg (10.14.0.69) port 8546 (#0)
> GET / HTTP/1.1
> Host: linux-01.he-eu-hel1.nimbus.prater.wg:8546
> User-Agent: curl/7.81.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 403 Forbidden
< Content-Type: text/plain; charset=utf-8
< X-Content-Type-Options: nosniff
< Date: Sat, 17 Sep 2022 06:51:45 GMT
< Content-Length: 23
* The requested URL returned error: 403
* Closing connection 0
curl: (22) The requested URL returned error: 403
At least RPC does not, I assume Auth RPC also does not. Which is shitty.
I've opened an issue:
But that's just related to the Geth 403 error, and my original issue still stands.
Based on https://github.com/ethereum/go-ethereum/issues/25802 it appears that:
host
header does not match --authrpc.vhost
flag we get 403 Forbidden
with body invalid host specified
.401 Unauthorized
with the relevant error. (ex.: invalid token
)So we can and should return this exact error to the user.
https://github.com/status-im/nim-json-rpc/commit/7c80b758566950950ab5e3b29fe7f38c3b8168b0 might have already resolved this, based on the logs you show.
It replaces
raise (ref RpcPostError)(msg: "Failed to send POST Request with JSON-RPC.", parent: e)
with
closeRefs()
raise (ref FailedHttpResponse)(msg: "Failed to read POST Response for JSON-RPC: " & e.msg, parent: e)
This nim-json-rpc
commit appears to be in the current unstable
nimbus-eth2
already.
@tersec oh, nice, but is e.msg
the whole payload? There's no JSON parsing involved? I don't see any(which is good).
I will try to test this out on unstable
today or during the weekend to confirm we are getting the errors as expected. Thanks.
It seems to just take the response directly, unparsed, yeah.
Well, I checked this, and the messages aren't very helpful:
no route to host
{
"lvl": "ERR",
"ts": "2022-09-30 12:17:37.799+00:00",
"msg": "Failed to send POST Request with JSON-RPC: Could not send request headers"
}
rejected
{
"lvl": "ERR",
"ts": "2022-09-30 12:19:35.170+00:00",
"msg": "Failed to send POST Request with JSON-RPC: Could not connect to remote host"
}
filtered
{
"lvl": "DBG",
"ts": "2022-09-30 12:23:35.542+00:00",
"msg": "Web3 request failed",
"topics": "eth1",
"req": "eth_chainId(m.dataProvider.web3.provider)",
"err": "Failed to send POST Request with JSON-RPC: Connection timed out"
}
The first two don't really explain what is wrong, the last one is a different message and DBG
level instead of ERR
for some reason, but at least it says that it's a timeout.
One more thing to note, neither of those requests actually indicate which web3 endpoint has caused this error, and considering we can use multiple --web3-url
flags it's not very helpful or even confusing if we don't know which one it is. It would probably make sense to add a url
field to those logs.
The error when using a non-existent DNS entry is correct:
{
"lvl": "ERR",
"ts": "2022-09-30 12:30:55.180+00:00",
"msg": "runQueueProcessingLoop: newPayload failed",
"err": "Failed to setup web3 connection: Could not resolve address of remote server"
}
But I could not find a Failed to send POST Request with JSON-RPC
log error related to it.
I just stubbed my toe on this again. Access to a Geth node was lost due to firewall issue and the logs showed no errors:
admin@node-01.aws-eu-central-1a.rocket.prod:~ % grep '"msg":"ERR"' nimbus.log | wc -l
0
admin@node-01.aws-eu-central-1a.rocket.prod:~ % grep 'newPayload: newPayload timed out' nimbus.log | wc -l
263
These logs which look kinda like errors but aren't look like this:
{
"lvl": "INF",
"ts": "2022-10-27 11:44:08.145+00:00",
"msg": "newPayload: newPayload timed out"
}
Which probably should be a warning, and if not, it should at least contain some additional context.
The only thing close to an error was this warning:
{
"lvl": "WRN",
"ts": "2022-10-27 11:44:00.087+00:00",
"msg": "Failed to exchange transition configuration",
"topics": "eth1",
"err": "engine_exchangeTransitionConfigurationV1(dataProvider.web3.provider,\n consensusCfg) failed 3 times"
}
But that is just a warning. The fundamental issue is that the node started and ran for half an hour without any connection to the Geth node and not a single error log message was printed.
Describe the bug Since https://github.com/status-im/nimbus-eth2/pull/3989 was merged I wanted to test out how the new errors work and configured a Prater testnet node with a wrong IP of Web3/AuthRPC/Engine endpoint to trigger some errors. What I got were some errors:
I removed the duplicates, and some of those are debug level, but this is all I got. This does indeed indicate an issue with the Web3/AuthRPC/Engine endpoint, but it does not indicate what is the issue. In this particular case, since I used an IP of a host that does not exist somewhere in these errors there should be a
no route to host
string, indicating that beacon node can't find the way to reach the target. But no such string appeared anywhere in the logs related to Web3 requests.This is a fundamental issue, since it prevents user from actually debugging what is the problem.
To Reproduce Steps to reproduce the behavior:
--web3-url
flag.Additional context The errors should at the very least help distinguish between those 4 scenarios:
no route to host
- Route to IP address cannot be found.timeout
- Port is unreachable, probably due to firewall.rejected
- Port is closed and connection was rejected.error
- Port is accessible but returns errors.Without knowing which of these scenarios is taking place it can be quite hard to figure out what's happening.