sigp / lighthouse

Ethereum consensus client in Rust
https://lighthouse.sigmaprime.io/
Apache License 2.0
2.91k stars 738 forks source link

after updating to 1.02 it fails connecting to external eth1 nodes #2011

Closed adv0r closed 3 years ago

adv0r commented 3 years ago

Description

After updating lighthouse can't connect to eth1 nodes. This has been confirmed by another user on discord. We are both running ubuntu and tried both alchemyapi and infura.io.

No firewalls, no mistakes in the endpoint.

I can curl the endpoint with no issues.

Version

cargo 1.48.0 (65cbdd2dc 2020-10-14) Lighthouse v1.0.2-f7183098 BLS Library: blst

Present Behaviour

Command: lighthouse bn --network mainnet --datadir /mnt/ssd/stake/ --eth1-endpoint https://mainnet.infura.io/v3/6****f/

Output:

Nov 29 15:52:27.442 INFO Lighthouse started                      version: Lighthouse/v1.0.2-f7183098
Nov 29 15:52:27.442 INFO Configured for network                  name: mainnet
Nov 29 15:52:27.443 INFO Data directory initialised              datadir: /mnt/ssd/stake
Nov 29 15:52:27.458 INFO Deposit contract                        address: 0x00000000219ab540356cbb839cbe05303d7705fa, deploy_block: 11184524
Nov 29 15:52:27.526 INFO Starting beacon chain                   method: resume, service: beacon
Nov 29 15:52:39.464 INFO Block production enabled                method: json rpc via http, endpoints: ["https://mainnet.infura.io/v3/*****/"]
Nov 29 15:52:39.519 INFO Beacon chain initialized                head_slot: 0, head_block: 0x4d61…9360, head_state: 0x7e76…2c2b, service: beacon
Nov 29 15:52:39.520 INFO Timer service started                   service: node_timer
Nov 29 15:52:39.521 INFO UPnP Attempting to initialise routes    service: UPnP
Nov 29 15:52:39.524 INFO Libp2p Service                          peer_id: 16Uiu2HAmT5no4boHMrUt4cZEQNoFTkiE3ynEnVnwmcNhUcmSra9m, service: libp2p
Nov 29 15:52:39.526 INFO ENR Initialised                         tcp: Some(9000), udp: Some(9000), ip: Some(93.*****), id: 0x398d..a7ed, seq: 2, enr: enr:-LK4QP2cyGpWINr-z15hPcFIrXtKTr-wZYHg2E4rwAuyMmknGaw13EmjAM2H2EjbC5P1Z0KYqhNIHlsYeDWS-O2UL0ACh2F0dG5ldHOIAAAAAAAAAACEZXRoMpC1MD8qAAAAAP__________gmlkgnY0gmlwhF2wojOJc2VjcDI1NmsxoQPWXhI5zcP7zuaKlDQ790UuMigbUlRXX--r1R_m8blfmIN0Y3CCIyiDdWRwgiMo, service: libp2p
Nov 29 15:52:39.528 INFO Listening established                   address: /ip4/0.0.0.0/tcp/9000/p2p/16Uiu2HAmT5no4boHMrUt4cZEQNoFTkiE3ynEnVnwmcNhUcmSra9m, service: libp2p
Nov 29 15:52:39.631 INFO HTTP server is disabled
Nov 29 15:52:39.632 INFO Waiting for genesis                     wait_time: 1 day 20 hrs, peers: 0, service: slot_notifier
Nov 29 15:52:48.723 INFO Sync state updated                      new_state: Synced, old_state: Stalled, service: sync
Nov 29 15:52:48.725 INFO Subscribed to topics                    topics: [BeaconBlock, BeaconAggregateAndProof, VoluntaryExit, ProposerSlashing, AttesterSlashing], service: network
Nov 29 15:52:51.632 INFO Waiting for genesis                     wait_time: 1 day 20 hrs, peers: 3, service: slot_notifier
Nov 29 15:52:54.548 WARN Error connecting to eth1 node. Trying fallback ..., endpoint: https://mainnet.infura.io/v3/****/, service: eth1_rpc
Nov 29 15:52:54.549 CRIT Couldn't connect to any eth1 node. Please ensure that you have an eth1 http server running locally on http://localhost:8545 or specify one or more (remote) endpoints using `--eth1-endpoints <COMMA-SEPARATED-SERVER-ADDRESSES>`. Also ensure that `eth` and `net` apis are enabled on the eth1 http server, warning: BLOCK PROPOSALS WILL FAIL WITHOUT VALID, SYNCED ETH1 CONNECTION, service: eth1_rpc
Nov 29 15:52:54.550 ERRO Failed to update eth1 cache             error: Failed to update Eth1 service: "All fallback errored: https://mainnet.infura.io/v3/6****715e8f/ => EndpointError(NotReachable)", retry_millis: 7000, service: eth1_rpc
Nov 29 15:52:55.164 WARN Error connecting to eth1 node. Trying fallback ..., endpoint: https://mainnet.infura.io/v3/60****e8f/, service: eth1_rpc
Nov 29 15:52:55.165 CRIT Couldn't connect to any eth1 node. Please ensure that you have an eth1 http server running locally on http://localhost:8545 or specify one or more (remote) endpoints using `--eth1-endpoints <COMMA-SEPARATED-SERVER-ADDRESSES>`. Also ensure that `eth` and `net` apis are enabled on the eth1 http server, warning: BLOCK PROPOSALS WILL FAIL WITHOUT VALID, SYNCED ETH1 CONNECTION, service: eth1_rpc
Nov 29 15:52:55.166 ERRO Failed to update eth1 cache             error: Failed to update Eth1 service: "All fallback errored: https://mainnet.infura.io/v3/6****f/ => EndpointError(NotReachable)", retry_millis: 7000, service: eth1_rpc
Nov 29 15:52:55.760 WARN Error connecting to eth1 node. Trying fallback ..., endpoint: https://mainnet.infura.io/v3/6****f/, service: eth1_rpc
Nov 29 15:52:55.761 CRIT Couldn't connect to any eth1 node. Please ensure that you have an eth1 http server running locally on http://localhost:8545 or specify one or more (remote) endpoints using `--eth1-endpoints <COMMA-SEPARATED-SERVER-ADDRESSES>`. Also ensure that `eth` and `net` apis are enabled on the eth1 http server, warning: BLOCK PROPOSALS WILL FAIL WITHOUT VALID, SYNCED ETH1 CONNECTION, service: eth1_rpc
Nov 29 15:52:55.762 ERRO Failed to update eth1 cache             error: Failed to update Eth1 service: "All fallback errored: https://mainnet.infura.io/v3/609fc*****715e8f/ => EndpointError(NotReachable)", retry_millis: 7000, service: eth1_rpc
^CNov 29 15:52:57.568 INFO Shutting down..
Nov 29 15:52:57.570 INFO Saved DHT state                         service: network
Nov 29 15:52:57.571 INFO Network service shutdown                service: network
Nov 29 15:52:57.584 INFO Saved beacon chain to disk              service: beacon

Some other time the output it's slighly different:

Nov 29 16:48:58.285 ERRO Failed to update eth1 cache             error: Failed to update eth1 cache: "All fallback errored: https://mainnet.infura.io/v3/609fc*****39ea715e8f => GetDepositLogsFailed(\"Request failed: reqwest::Error { kind: Request, url: Url { scheme: \\\"https\\\", host: Some(Domain(\\\"mainnet.infura.io\\\")), port: None, path: \\\"/v3/60******5e8f\\\", query: None, fragment: None }, source: TimedOut }\")", retry_millis: 7000, service: eth1_rpc

I have no authentication activated on Infura.

I can curl it just fine

curl https://mainnet.infura.io/v3/60***15e8f -X POST -H "Content-Type: application/json" -d '{"jsonrpc":"2.0","method":"eth_blockNumber","params": [],"id":1}'

{"jsonrpc":"2.0","id":1,"result":"0xad413a"}

Replace all these symptoms with Alchemy, same problem

In alchemy dashboard I can see all the request arriving to the endpoint with no problem. Here is the raw request:

{
"id":1
"jsonrpc":"2.0"
"method":"eth_getLogs"
"params":[
0:{
"address":"0x00000000219a***"
"fromBlock":"0xaaebf4"
"toBlock":"0xaaefdc"
"topics":[
0:"0x649bbc62d0e31342afea4e5cd82d4049e7e1ee912fc0889aa790803be39038c5"
]
}
]
}

and here is a raw response

{"id": 1, "jsonrpc": "2.0", "result": [{"address": "0x00000000219ab540356cbb839cbe05303d7705fa", "blockHash": "0x779b5145d45be31fe12d9346e5cc50aac62baa5ad33732de01657ec9911ea4cf", "blockNumber": "0xaaedb5", "data": "0x00000000000000000000000000000000000000000000000000000000000000a00000000000000000000000000000000000", "logIndex": "0xac", "removed": false, "topics": ["0x649bbc62d0e31342afea4e5cd82d4049e7e1ee912fc0889aa790803be39038c5"], "transactionHash": "0x64a66740ae4cf64e8735271f94fa9c763e9a68c506292671041301196f6f2d70", "transactionIndex": "0x5b"}, {"address": "0x00000000219ab540356cbb839cbe05303d7705fa", "blockHash": "0x950b51346b740e50254a41e5bbdb2628a07e9cbaf9ff6ec6a8c3407c193b82fb", "blockNumber": "0xaaedb6", "data": "0x00000000000000000000000000000000000000000000000000000000000000a00000000000000000000000000000000000", "logIndex": "0xf6", "removed": false, "topics": ["0x649bbc62d0e31342afea4e5cd82d4049e7e1ee912fc0889aa790803be39038c5"], "transactionHash": "0xccd2ddb9827a39bb30fd1a2166f40d2d571feab375e0d6469ba749d86e3f4e0b", "transactionIndex": "0xbe"}, {"address": "0x00000000219ab540356cbb839cbe05303d7705fa", "blockHash": "0x950b51346b740e50254a41e5bbdb2628a07e9cbaf9ff6ec6a8c3407c193b82fb", "blockNumber": "0xaaedb6", "data": "0x00000000000000000000000000000000000000000000000000000000000000a00000000000000000000000000000000000", "logIndex": "0x100", "removed": false, "topics": ["0x649bbc62d0e31342afea4e5cd82d4049e7e1ee912fc0889aa790803be39038c5"], "transactionHash": "0x721a69c5bd3b697827eabd44d7176e95ba803a462d99e74f62299111e8c516b1", "transactionIndex": "0xc6"}, {"address": "0x00000000219ab540356cbb839cbe05303d7705fa", "blockHash": "0x98cf9fdf393b73e3f133282cbb511eeb4624c5f7879100dd88ea706d185084d1", "blockNumber": "0xaaedba", "data": "0x00000000000000000000000000000000000000000000000000000000000000a00000000000000000000000000000000000", "logIndex": "0x8", "removed": false, "topics": ["0x649bbc62d0e31342afea4e5cd82d4049e7e1ee912fc0889aa790803be39038c5"], "transactionHash": "0xedf5930af6dd9efd54787d57acb250aff72ecb9a697b4abefa87274035343787", "transactionIndex": "0xd"}, {"address": "0x00000000219ab540356cbb839cbe05303d7705fa", "blockHash": "0x98cf9fdf393b73e3f133282cbb511eeb4624c5f7879100dd88ea706d185084d1", "blockNumber": "0xaaedba", "data": "0x00000000000000000000000000000000000000000000000000000000000000a00000000000000000000000000000000000", "logIndex": "0x2c", "removed": false, "topics": ["0x649bbc62d0e31342afea4e5cd82d4049e7e1ee912fc0889aa790803be39038c5"], "transactionHash": "0x92f90a1271a184e1c5ee1e468f391bab5749108effb2c57a49f2cef153f9bc58", "transactionIndex": "0x19"}, {"address": "0x00000000219ab540356cbb839cbe05303d7705fa", "blockHash": "0x98cf9fdf393b73e3f133282cbb511eeb4624c5f7879100dd88ea706d185084d1", "blockNumber": "0xaaedba", "data": "0x00000000000000000000000000000000000000000000000000000000000000a00000000000000000000000000000000000", "logIndex": "0x3e", "removed": false, "topics": ["0x649bbc62d0e31342afea4e5cd82d4049e7e1ee912fc0889aa790803be39038c5"], "transactionHash": "0x7c2f5f1ade5b7df58eebbe3c76b8cb038936e0f7fb4479ab5ac9df7a6f0fbd41", "transactionIndex": "0x1f"}, {"address": "0x00000000219ab540356cbb839cbe05303d7705fa", "blockHash": "0x98cf9fdf393b73e3f133282cbb511eeb4624c5f7879100dd88ea706d185084d1", "blockNumber": "0xaaedba", "data": "0x00000000000000000000000000000000000000000000000000000000000000a00000000000000000000000000000000000", "logIndex": "0x4a", "removed": false, "topics": ["0x649bbc62d0e31342afea4e5cd82d4049e7e1ee912fc0889aa790803be39038c5"], "transactionHash": "0x5722e35df3387aa21c825b0d8ddea3cf168d421a31edfe63c73ffc028df2a4f1", "transactionIndex": "0x27"}, {"address": "0x00000000219ab540356cbb839cbe05303d7705fa", "blockHash": "0xc3ae7bf1ad372a3a31dc82db625e594c619fd70c99f530018bf1dd5541cd3a3d", "blockNumber": "0xaaee47", "data": "0x00000000000000000000000000000000000000000000000000000000000000a00000000000000000000000000000000000", "logIndex": "0xf0", "removed": false, "topics": ["0x649bbc62d0e31342afea4e5cd82d4049e7e1ee912fc0889aa790803be39038c5"], "transactionHash": "0xaed6dce1102e4b034e2587ffcfc7ce08651362a611a397936d7fc49447edca11", "transactionIndex": "0x85"}, {"address": "0x00000000219ab540356cbb839cbe05303d7705fa", "blockHash": "0xc3ae7bf1ad372a3a31dc82db625e594c619fd70c99f530018bf1dd5541cd3a3d", "blockNumber": "0xaaee47", "data": "0x00000000000000000000000000000000000000000000000000000000000000a00000000000000000000000000000000000", "logIndex": "0xf1", "removed": false, "topics": ["0x649bbc62d0e31342afea4e5cd82d4049e7e1ee912fc0889aa790803be39038c5"], "transactionHash": "0xa9ac740839af5b3d9a049202d7dc81a9ad913408736adbba7334f29e68469d77", "transactionIndex": "0x86"}, {"address": "0x00000000219ab540356cbb839cbe05303d7705fa", "blockHash": "0x6ea0faea7b3556f0e7c276519134b34c874be43750c0f4b6c0ddbc3f791f7502", "blockNumber": "0xaaef9d", "data": "0x00000000000000000000000000000000000000000000000000000000000000a00000000000000000000000000000000000", "logIndex": "0x3c", "removed": false, "topics": ["0x649bbc62d0e31342afea4e5cd82d4049e7e1ee912fc0889aa790803be39038c5"], "transactionHash": "0xed7637442cac9df2eb66d7ce16fccb1b676d04060fbd752563b0ed8d96c08705", "transactionIndex": "0x4b"}]}

Steps to resolve

preddy25 commented 3 years ago

Same symptoms after updating to 1.0.2. Fallback n APIs to both alchemy and infura was tested and working fine till after the update. Infura seem to work fine without any errors but there isnt seem to be any calls logged at the dashboard. Alchemy simply doesnt work , gets a 403 error

paulhauner commented 3 years ago

I'm unable to reproduce this on my end across multiple systems. @adv0r and @preddy25 could you please share some details about your hosts. E.g., operating system, CPU architecture (ARM, Intel, etc), if you compiled lighthouse locally or used cross-compiling.

paulhauner commented 3 years ago

@blacktemplar, just bringing your attention to this in case you've faced this before.

AgeManning commented 3 years ago

Can you confirm that v1.0.1 works fine but v1.0.2 does not? (Or are you upgrading from an even earlier version?)

pawanjay176 commented 3 years ago

Not able to reproduce this either with multiple syncs.

blacktemplar commented 3 years ago

@blacktemplar, just bringing your attention to this in case you've faced this before.

So during my tests I also encountered some problems with infura nodes, running into timeouts. Both errors could possibly be caused by a timeout (the first if the timeout happens during the initial network_id/chain_id checks and the second if a timeout happens afterwards during getting deposit logs).

@adv0r you said you can see the requests + responses in the Alchemy dashboard, do you also see by chance timestamps, when the requests arrived / when the responses were sent? Furthermore, are those errors happening consistently, or just sometimes?

pg3141 commented 3 years ago

Have the same issue. Running lighthouse-v1.0.2-x86_64-unknown-linux-gnu-portable

b-m-f commented 3 years ago

2 things I have noticed:

@adv0r you have a trailing slash in the endpoint URL

pg3141 commented 3 years ago

@b-m-f I have --eth1-endpoints

ExecStart=/usr/local/bin/lighthouse bn --network mainnet --staking --datadir /var/lib/lighthouse --metrics --eth1-endpoints http://192.168.1.12:8545,https://eth-mainnet.alchemyapi.io/v2/HM*******j6g,https://mainnet.infura.io/v3/16c****aef5f

The WARN appears only after I kill the local eht1 endpoint:

Nov 30 19:48:48.620 WARN Error connecting to eth1 node. Trying fallback ..., endpoint: http://192.168.1.12:8545, service: eth1_rpc

Looking at alchemy, I see requests coming in:

alchemy

b-m-f commented 3 years ago

@pg3141 That looks correct though.

After killing the local node the switch-over should occur.

Or do the remote endpoints time out as well?

pg3141 commented 3 years ago

@b-m-f I was under the impression that those WARN messages mean the switch over did not occur, but judging by the fact that requests do come in to the fallback endpoints... I guess the switch over does occur.

paulhauner commented 3 years ago

I guess the switch over does occur.

That's correct, therefore I understand that Lighthouse is performing as expected.

@adv0r you have a trailing slash in the endpoint URL

I can reproduce this issue if I add the trailing slash, leading me to think that the root cause is a malformed --eth2-endpoints flag and therefore not an issue with Lighthouse.

I'm going to close this since I believe all issues are addressed, feel free to reopen if that's not the case.

Special thanks for @b-m-f for their help here :)

adv0r commented 3 years ago

@adv0r you said you can see the requests + responses in the Alchemy dashboard, do you also see by chance timestamps, when the requests arrived / when the responses were sent? Furthermore, are those errors happening consistently, or just sometimes?

Error happens consistently...

To reply your question,as I said on discord where you can also see a screenshot.

by checking alchemy logs I can see that in less than 1 minute from starting, lighthouse makes more than 40+ queries to the method eth_getlogs ( All the calls are succesful btw, with 200 ms response) .

moreover all the queries for the same fromblock, toblock numbers.

Is this expected? Something is not right, with that many requests in 1 minute, I will have to pay for millions/request per month soon, cluttering their service

adv0r commented 3 years ago

@adv0r you have a trailing slash in the endpoint URL

Doesn't change the result

paulhauner commented 3 years ago

Is this expected? Something is not right, with that many requests in 1 minute, I will have to pay for millions/request per month soon, cluttering their service

This is expected, when it boots it needs to sync all the deposit logs. After that it will cache them and only request new ones.