stopping container completely breaks it

eyooooo commented 1 year ago

running a stock container forked from this repo. syncing against L1 local eth node or a remote alchemy node.

no problems syncing for a few days, had to bounce the container. now the container throws tons of errors:

simple-optimism-node-fault-detector-1  | 
simple-optimism-node-fault-detector-1  | > @eth-optimism/fault-detector@0.4.0 start
simple-optimism-node-fault-detector-1  | > ts-node ./src/service.ts
simple-optimism-node-fault-detector-1  | 
simple-optimism-node-healthcheck-1     | {"level":50,"time":1671298207500,"msg":"target client not connected"}
simple-optimism-node-influxdb-1        | [httpd] 172.25.0.4 - test [17/Dec/2022:17:30:08 +0000] "POST /write?consistency=&db=l2geth&precision=&rp= HTTP/1.1" 204 0 "-" "InfluxDBClient" 7389d73a-7e30-11ed-8044-000000000000 4577
simple-optimism-node-influxdb-1        | [httpd] 172.25.0.4 - test [17/Dec/2022:17:30:08 +0000] "GET /ping HTTP/1.1" 204 0 "-" "InfluxDBClient" 738a9a48-7e30-11ed-8045-000000000000 10
simple-optimism-node-fault-detector-1  | {"level":30,"time":1671298208633,"msg":"starting app server"}
simple-optimism-node-fault-detector-1  | {"level":30,"time":1671298208662,"port":7300,"hostname":"0.0.0.0","msg":"app server started"}
simple-optimism-node-fault-detector-1  | {"level":30,"time":1671298208663,"msg":"initializing service"}
simple-optimism-node-fault-detector-1  | /opt/optimism/node_modules/@ethersproject/providers/node_modules/@ethersproject/logger/src.ts/index.ts:269
simple-optimism-node-fault-detector-1  |         const error: any = new Error(message);
simple-optimism-node-fault-detector-1  |                            ^
simple-optimism-node-fault-detector-1  | Error: could not detect network (event="noNetwork", code=NETWORK_ERROR, version=providers/5.7.1)
simple-optimism-node-fault-detector-1  |     at Logger.makeError (/opt/optimism/node_modules/@ethersproject/providers/node_modules/@ethersproject/logger/src.ts/index.ts:269:28)
simple-optimism-node-fault-detector-1  |     at Logger.throwError (/opt/optimism/node_modules/@ethersproject/providers/node_modules/@ethersproject/logger/src.ts/index.ts:281:20)
simple-optimism-node-fault-detector-1  |     at JsonRpcProvider.<anonymous> (/opt/optimism/node_modules/@ethersproject/providers/src.ts/json-rpc-provider.ts:483:23)
simple-optimism-node-fault-detector-1  |     at step (/opt/optimism/node_modules/@ethersproject/providers/lib/json-rpc-provider.js:48:23)
simple-optimism-node-fault-detector-1  |     at Object.throw (/opt/optimism/node_modules/@ethersproject/providers/lib/json-rpc-provider.js:29:53)
simple-optimism-node-fault-detector-1  |     at rejected (/opt/optimism/node_modules/@ethersproject/providers/lib/json-rpc-provider.js:21:65)
simple-optimism-node-fault-detector-1  |     at processTicksAndRejections (node:internal/process/task_queues:96:5) {
simple-optimism-node-fault-detector-1  |   reason: 'could not detect network',
simple-optimism-node-fault-detector-1  |   code: 'NETWORK_ERROR',
simple-optimism-node-fault-detector-1  |   event: 'noNetwork'
simple-optimism-node-fault-detector-1  | }
simple-optimism-node-fault-detector-1  | npm ERR! Lifecycle script `start` failed with error: 
simple-optimism-node-fault-detector-1  | npm ERR! Error: command failed 
simple-optimism-node-fault-detector-1  | npm ERR!   in workspace: @eth-optimism/fault-detector@0.4.0 
simple-optimism-node-fault-detector-1  | npm ERR!   at location: /opt/optimism/packages/fault-detector 
simple-optimism-node-fault-detector-1 exited with code 1
simple-optimism-node-healthcheck-1     | {"level":50,"time":1671298212505,"msg":"target client not connected"}
simple-optimism-node-influxdb-1        | [httpd] 172.25.0.4 - test [17/Dec/2022:17:30:13 +0000] "GET /ping HTTP/1.1" 204 0 "-" "InfluxDBClient" 7684a883-7e30-11ed-8046-000000000000 24
simple-optimism-node-healthcheck-1     | {"level":30,"time":1671298213222,"log":"172.25.0.3 - GET /metrics HTTP/1.1 200 - - 3.681 ms\n","msg":"server log"}
simple-optimism-node-healthcheck-1     | {"level":50,"time":1671298217509,"msg":"target client not connected"}
simple-optimism-node-influxdb-1        | [httpd] 172.25.0.4 - test [17/Dec/2022:17:30:18 +0000] "POST /write?consistency=&db=l2geth&precision=&rp= HTTP/1.1" 204 0 "-" "InfluxDBClient" 797fa65d-7e30-11ed-8047-000000000000 3949
simple-optimism-node-influxdb-1        | [httpd] 172.25.0.4 - test [17/Dec/2022:17:30:18 +0000] "GET /ping HTTP/1.1" 204 0 "-" "InfluxDBClient" 798051fe-7e30-11ed-8048-000000000000 26
simple-optimism-node-healthcheck-1     | {"level":50,"time":1671298222517,"msg":"target client not connected"}
simple-optimism-node-influxdb-1        | [httpd] 172.25.0.4 - test [17/Dec/2022:17:30:23 +0000] "GET /ping HTTP/1.1" 204 0 "-" "InfluxDBClient" 7c7a6cc6-7e30-11ed-8049-000000000000 22
simple-optimism-node-healthcheck-1     | {"level":50,"time":1671298227523,"msg":"target client not connected"}
simple-optimism-node-influxdb-1        | [httpd] 172.25.0.4 - test [17/Dec/2022:17:30:28 +0000] "POST /write?consistency=&db=l2geth&precision=&rp= HTTP/1.1" 204 0 "-" "InfluxDBClient" 7f758509-7e30-11ed-804a-000000000000 3812
simple-optimism-node-influxdb-1        | [httpd] 172.25.0.4 - test [17/Dec/2022:17:30:28 +0000] "GET /ping HTTP/1.1" 204 0 "-" "InfluxDBClient" 7f762cfa-7e30-11ed-804b-000000000000 13

does the container not shut down cleanly when doing docker compose down?

eyooooo commented 1 year ago

i think my issue relates to https://github.com/smartcontracts/simple-optimism-node/issues/20

it seems when stopping the container with docker compose down it doesnt stop cleanly and something gets hosed with the networking. the container is unable to communicate over the network which is why we see these network related errors.

smartcontracts commented 1 year ago

I recently merged a fix to the fault-detector that should prevent it from erroring out like this when the L2 node isn't fully up. Once this gets released, will probably fix this. https://github.com/ethereum-optimism/optimism/pull/4486

smartcontracts commented 1 year ago

You might be able to use this canary release image temporarily prerelease-0.0.0-sequencer-sync-bug

eyooooo commented 1 year ago

thanks @smartcontracts! can i just replace my current dir with this, replace .env and be rdy 2 rock?

eyooooo commented 1 year ago

nah i tried doing a git pull and still broken i guess i need to update one of the containers?

eyooooo commented 1 year ago

ok i updated docker-compose.yaml fault-detector with the image you suggested, did a docker compose pull, it downloaded something and i started with docker compose up

i no longer get the Error: could not detect network error but i am getting simple-optimism-node-healthcheck-1 | {"level":50,"time":1671306298220,"msg":"target client not connected"} errors.

simple-optimism-node-fault-detector-1 | {"level":30,"time":1671306453994,"msg":"[object Object] provider not connected, retrying..."}

eyooooo commented 1 year ago

seems like eventually it kicked into gear and its working. 👍

edit: not working really yet but the error above went away.

eyooooo commented 1 year ago

bouncing docker with docker compose down/up a few times got it working. ill take it.

eyooooo commented 1 year ago

simple-optimism-node-l2geth-1          | DEBUG[12-20|05:36:15.511] Historical transaction matches           index=15018743 hash=0xd1956787edafc6c6d9a44fd3e87ca7c66cc8a0dffef6c101f3aa6b32e1848a5e
simple-optimism-node-l2geth-1          | DEBUG[12-20|05:36:15.511] Historical transaction matches           index=15018744 hash=0x3d7d092576a70c44ee2096ad9a775b771321f508b28464eaf0f0627c143259fe
simple-optimism-node-l2geth-1          | DEBUG[12-20|05:36:15.511] Historical transaction matches           index=15018745 hash=0x3ff70d6672f6da11ad116c81bffd54fc08b38733165d18e502cd7d5c3cabec88
simple-optimism-node-l2geth-1          | DEBUG[12-20|05:36:15.511] Historical transaction matches           index=15018746 hash=0xa7330b885125c9ffa45873797cc37d10ef1da0e90aea59ab10ae36483e779c3e
simple-optimism-node-l2geth-1          | DEBUG[12-20|05:36:15.511] Historical transaction matches           index=15018747 hash=0x301c0d4f9628328092f86f8059d21cd1fa0979823e047210f603c25eff6a4eaf
simple-optimism-node-l2geth-1          | DEBUG[12-20|05:36:15.512] Historical transaction matches           index=15018748 hash=0x0293bac40302ac8ea8fd8b0b3e599d9b1203387a06c2cee8cbaf4d3e4c5d9786
simple-optimism-node-l2geth-1          | ERROR[12-20|05:36:15.512] Mismatched transaction                   index=15018749                                    
simple-optimism-node-l2geth-1          | ERROR[12-20|05:36:15.512] Mismatched transaction                   index=15018750              
simple-optimism-node-l2geth-1          | ERROR[12-20|05:36:15.512] Mismatched transaction                   index=15018751                                 
simple-optimism-node-l2geth-1          | ERROR[12-20|05:36:15.512] Mismatched transaction                   index=15018752                                 
simple-optimism-node-l2geth-1          | ERROR[12-20|05:36:15.512] Mismatched transaction                   index=15018753                                    
simple-optimism-node-l2geth-1          | DEBUG[12-20|05:36:15.512] Historical transaction matches           index=15018754 hash=0xcdf3f41486fbec28b949d77b01ca3f150ba815db162aaaeede600bc8eeeeaf5a
simple-optimism-node-l2geth-1          | DEBUG[12-20|05:36:15.512] Applying transaction to tip              index=15018755 hash=0x6bfc825cfe2d3c28d38734586b69be3714637f50daac8440c811a88ea0de16eb or
igin=sequencer                                                                                                                                                                                       
simple-optimism-node-l2geth-1          | DEBUG[12-20|05:36:15.512] Attempting to commit rollup transaction  hash=0x6bfc825cfe2d3c28d38734586b69be3714637f50daac8440c811a88ea0de16eb
simple-optimism-node-l2geth-1          | DEBUG[12-20|05:36:15.512] Served eth_blockNumber                   conn=172.26.0.2:48040   reqid=372106 t=43.802µs                                          
simple-optimism-node-l2geth-1          | ERROR[12-20|05:36:15.513] Problem committing transaction           msg="nonce too high"
simple-optimism-node-l2geth-1          | ERROR[12-20|05:36:15.513] Got error waiting for transaction to be added to chain msg="nonce too high"
simple-optimism-node-l2geth-1          | ERROR[12-20|05:36:15.513] Could not verify                         error="Verifier cannot sync transaction batches to tip: Cannot sync transaction batches t
o tip: Cannot sync batches: cannot apply batched transaction: Cannot apply batched transaction: nonce too high"
simple-optimism-node-l2geth-1          | DEBUG[12-20|05:36:15.516] Served eth_chainId                       conn=172.26.0.2:48046   reqid=372107 t=50.735µs
simple-optimism-node-fault-detector-1  | {"level":30,"time":1671514575516,"batchEnd":48695289,"latestBlock":15018754,"msg":"node is behind, waiting for sync"}

tappikone commented 1 year ago

Getting this same after shutting down the node.

eyooooo commented 1 year ago

@smartcontracts i had an idea for this but i cant test it this weekend. perhaps adding a bit of a delay to stopping the containers will be a hacky solution? something like docker compose down --timeout 300

smartcontracts commented 1 year ago

Sorry for the delay here. Have been busy getting Bedrock shipped. I could increase the stop grace period. It's currently 180s but can increase to 300s.

smartcontracts commented 1 year ago

I created #44 to increase the default grace period to 5m

eyooooo commented 1 year ago

just adding a note here - after stopping the containers once, running and stopping again, here is observed behavior:

this error Error response from daemon: No such container: 3893e75e4b4ea30c8d75874bcc1c6d941957dab5ce5cde08a8fec9c90e2c7342

this output

[+] Running 14/14
 ⠿ Container simple-optimism-node-healthcheck-1             Removed                                                                                                                                                                                       10.4s
 ⠿ Container simple-optimism-node-torrent-1                 Removed                                                                                                                                                                                        4.3s
 ⠿ Container simple-optimism-node-fault-detector-bedrock-1  Removed                                                                                                                                                                                        1.1s
 ⠿ Container simple-optimism-node-dtl-1                     Removed                                                                                                                                                                                        1.0s
 ⠿ Container simple-optimism-node-op-node-1                 Removed                                                                                                                                                                                      180.3s
 ⠿ Container simple-optimism-node-prometheus-1              Removed                                                                                                                                                                                        0.7s
 ⠿ Container simple-optimism-node-influxdb-1                Removed                                                                                                                                                                                        0.8s
 ⠿ Container simple-optimism-node-bedrock-init-1            Removed                                                                                                                                                                                       10.4s
 ⠿ Container simple-optimism-node-op-geth-1                 Removed                                                                                                                                                                                      180.3s
 ⠿ Container simple-optimism-node-fault-detector-1          Removed                                                                                                                                                                                       10.5s
 ⠿ Container simple-optimism-node-healthcheck-bedrock-1     Removed                                                                                                                                                                                        1.0s
 ⠿ Container simple-optimism-node-grafana-1                 Removed                                                                                                                                                                                        0.6s
 ⠿ Container simple-optimism-node-l2geth-1                  Removed                                                                                                                                                                                        0.9s
 ⠿ Network simple-optimism-node_default                     Removed                                                                                                                                                                                        0.1s

notice imple-optimism-node-op-node-1 and simple-optimism-node-op-geth-1 take the full 180s timeout time.

Chomtana commented 11 months ago

Stale pre-bedrock issue

smartcontracts / simple-optimism-node

stopping container completely breaks it #31