streamingfast / merger

Apache License 2.0
4 stars 5 forks source link

merger generates corrupt merged blocks when it can't read one-block files (after previously finding them) #25

Closed matthewdarwin closed 4 months ago

matthewdarwin commented 1 year ago

It seems to me that if the merger has a problem reading the one-block files (because someone chmod the directory while it is merging for example), then it will upload an incomplete merged block

$ rclone ls march:eth-sf-v2-merged-blocks --include '0012207*'
 17056329 0012207000.dbin.zst
 17220134 0012207100.dbin.zst
 17804196 0012207200.dbin.zst
 17838958 0012207300.dbin.zst
 18098795 0012207400.dbin.zst
   129007 0012207500.dbin.zst

Once I remove this corrupt merged blocks, merger will continue.

matthewdarwin commented 1 year ago
2022-09-26T20:13:14.810Z INFO (merger) merged and uploaded {"filename": "0012207400", "merge_time": "8.200236422s"}
2022-09-26T20:13:14.811Z INFO (merger) about to write merged blocks to storage location {"filename": "0012207500", "write_timeout": "5m0s", "lower_block_num": 12207499, "highest_block_num": 12207599}
2022-09-26T20:13:14.810Z ERRO (merger) merger returned error {"error": "lstat /var/lib/dfuse/storage/one-blocks/0012207801-4a082f97c69afab4-fbf479b6530c2d8f-12207601-x42b.dbin.zst: permission denied"}
2022-09-26T20:13:16.348Z WARN (merger) retrying after error {"error": "writing through pipe: not found"}
2022-09-26T20:13:17.962Z INFO (merger) merged and uploaded {"filename": "0012207500", "merge_time": "3.151758753s"}
2022-09-26T20:13:17.963Z ERRO (fireeth)
################################################################
Fatal error in app merger:
lstat /var/lib/dfuse/storage/one-blocks/0012207801-4a082f97c69afab4-fbf479b6530c2d8f-12207601-x42b.dbin.zst: permission denied
################################################################
2022-09-26T20:13:17.963Z INFO (dgrpc) forcing gRPC server to stop
2022-09-26T20:13:17.964Z INFO (fireeth) application merger shutdown unexpectedly, quitting
2022-09-26T20:13:17.964Z INFO (fireeth) waiting for all apps termination...
2022-09-26T20:13:17.964Z INFO (fireeth) all apps terminated gracefully
Error: lstat /var/lib/dfuse/storage/one-blocks/0012207801-4a082f97c69afab4-fbf479b6530c2d8f-12207601-x42b.dbin.zst: permission denied
2022-09-26T20:13:17.964Z ERRO (derr) dfuse {"error": "lstat /var/lib/dfuse/storage/one-blocks/0012207801-4a082f97c69afab4-fbf479b6530c2d8f-12207601-x42b.dbin.zst: permission denied"}
matthewdarwin commented 1 year ago

Ran into problem with corrupted merged blocks a few times now. I am guessing some error handling is missing in the merging process? Would be better if merger fatal and not create a corrupted block.

  7898155 0006298000.dbin.zst
  6883563 0006298100.dbin.zst
   107414 0006298200.dbin.zst
        0 0006298300.dbin.zst
        0 0006298400.dbin.zst
  7125339 0006298500.dbin.zst
  7356797 0006298600.dbin.zst
  7816931 0006298700.dbin.zst
  8080654 0006298800.dbin.zst
  7457503 0006298900.dbin.zst