jrpc2.Client structs are shared between shovel tasks and therefore run concurrently. There was a race condition after downloading (possibly cached via bcache) a header and before downloading the logs. This would result in duplicate transactions being added to the block. The duplicated transactions would contain only the requested logs (as opposed to the entire set of the transaction's logs).
In some cases this duplicated data is not a problem because the integration process removes logs that don't match the integration's configured event signature. However, if you have multiple integrations sharing an event signature the integration process will process the log more than once. In this case, the integration will encounter an error when coping the data to the PG table because the table contains a unique constraint. This will cause n-1 integrations (which share the event signature) to fail. Eventually the n-1 integrations may succeed because the block/header is not cached forever. If other integrations are making progress or if the Shovel process is restarted, this problem may go away.
The end result of this bug is that an integration may not be able to make progress but there should be no duplicated or missing data in the database because of the unique constraint.
Users hitting this bug will see errors in the logs that report a unique constraint violation on one of the N integrations that share the event signature.
The fix for this bug was to introduce concurrency control around the adding of txs to an in-memory (or cached) block. Once the jrpc2 client downloads the block (or header) the multiple, concurrent requests to download logs (or receipts) will synchronize their access when adding (or reading) a transaction from the block. The client will ask the block for a tx identified by the tx idx. The block will either return an existing tx or create a new one and add it to its txs slice. A similar process happens to ensure we don't add duplicate logs.
Given that we now have synchronized access to a block's transactions and a transaction's logs, I think we have all the footguns accounted for.
It was nice that the PG unique constraint prevented data corruption.
jrpc2.Client structs are shared between shovel tasks and therefore run concurrently. There was a race condition after downloading (possibly cached via bcache) a header and before downloading the logs. This would result in duplicate transactions being added to the block. The duplicated transactions would contain only the requested logs (as opposed to the entire set of the transaction's logs).
In some cases this duplicated data is not a problem because the integration process removes logs that don't match the integration's configured event signature. However, if you have multiple integrations sharing an event signature the integration process will process the log more than once. In this case, the integration will encounter an error when coping the data to the PG table because the table contains a unique constraint. This will cause n-1 integrations (which share the event signature) to fail. Eventually the n-1 integrations may succeed because the block/header is not cached forever. If other integrations are making progress or if the Shovel process is restarted, this problem may go away.
The end result of this bug is that an integration may not be able to make progress but there should be no duplicated or missing data in the database because of the unique constraint.
Users hitting this bug will see errors in the logs that report a unique constraint violation on one of the N integrations that share the event signature.
The fix for this bug was to introduce concurrency control around the adding of txs to an in-memory (or cached) block. Once the jrpc2 client downloads the block (or header) the multiple, concurrent requests to download logs (or receipts) will synchronize their access when adding (or reading) a transaction from the block. The client will ask the block for a tx identified by the tx idx. The block will either return an existing tx or create a new one and add it to its txs slice. A similar process happens to ensure we don't add duplicate logs.
Given that we now have synchronized access to a block's transactions and a transaction's logs, I think we have all the footguns accounted for.
It was nice that the PG unique constraint prevented data corruption.