streamingfast / substreams-sink-files

Binary application to consume your Substreams and output it's data out to files format (JSON, CSV, etc.)
Apache License 2.0
2 stars 3 forks source link

Duplicated data sunk when stream is reaching "real-time". #4

Closed AngeloCa closed 1 year ago

AngeloCa commented 1 year ago

When not specifying any stop block in the sinking process, the process starts sinking data from the initial block provided in the manifest until it reaches 'real-time' (latest irreversible block).

For a simple transaction map module (block_number, tx_hash, timestamp, value), it was observed that files sunk during 'real-time' contained duplicated transactions. For example:

We noted 3 different patterns:

For example for the polygon endpoint polygon.streamingfast.io:443 we found, that tx 0x70b63b55813878f52875f7e600dc705e029f67e9858c927a1c82aefb90b31da7 was duplicated 3 times, 2 times with timestamp 1680466625 and once with timestamp 1680466623.

This is not an isolated event, it happens every 10-20 blocks on average where a bunch of transactions is duplicated (almost full block). This appears also on Ethereum and BSC endpoints.

Could it be possible that reversible blocks have been inserted?

Note: This behavior never happens for files that weren't sunk in 'real-time' (the whole chain has been sunk and analyzed).

Note: Even though the stream is configured to stream only irreversible blocks when looking at the substreams-sink-files logs in real-time, we can see that the last_block value evolves at the same pace as etherscan or polygonscan (even faster). There is no safe delay.

maoueh commented 1 year ago

Seems this was a problem with the server where final blocks only was not honored. Fixed if you use https://github.com/streamingfast/substreams-sink-files/releases/tag/v2.0.0