lightninglabs / taproot-assets

A layer 1 daemon, for the Taproot Assets Protocol specification, written in Go (golang)
MIT License
464 stars 111 forks source link

[bug]/tapdb: proof storage clobber error for postgres backend #951

Closed Roasbeef closed 4 months ago

Roasbeef commented 5 months ago

Background

This failure has started to pop up on itests for the postgres backend:

2024-06-12 16:42:33.793 [INF] GRDN: Received new proof file for asset ID 9bc313ea555c28227c86fa4a2b90f54dff46b14db4bbc56bd5d5e2e4dc3af34f, version=0,num_proofs=3
2024-06-12 16:42:33.807 [INF] FRTR: Marking parcel (txid=80bf98b4af11c552c95557806afa79468edec7b0a947f2c7b58306b0d4df4211) as confirmed!
2024-06-12 16:42:33.814 [ERR] FRTR: unable to transfer receiver proof: unable to log parcel delivery confirmation: failed to confirm transfer: unknown postgres error: ERROR: more than one row returned by a subquery used as an expression (SQLSTATE 21000)

It can be seen as a test failure in this itest run: https://github.com/lightninglabs/taproot-assets/actions/runs/9486428727/job/26140728303

    --- FAIL: TestTaprootAssetsDaemon/multi_address (13.48s)

The issue appears to be with this query: https://github.com/lightninglabs/taproot-assets/blob/51e3512464e1efbed2abb6b3ea50f26fa41802f5/tapdb/sqlc/queries/assets.sql#L671-L691

If the CTE for target_asset returns more than one asset_id, then this issue will be triggered. AFAICT, sqlite will just take the very first asset ID.

If we look at the table, we see that asset_id here is intended to be unique: https://github.com/lightninglabs/taproot-assets/blob/51e3512464e1efbed2abb6b3ea50f26fa41802f5/tapdb/sqlc/migrations/000002_assets.up.sql#L245-L256

This appears to be the root of the issue, as if you insert proofs for two diff script keys, but the same asset ID, then behavior is undefined across the two DB backends. However, this is only an issue for the db based proof file storage. The file based storage uses a full path of the outpoint+assetID+scriptKey (prev ID 3-tuple). The Universe based storage uses something similar.

Most sub-systems end up using a MultiArchive that's based on some or all the proof storage backends, which my answer why this issue has been hidden for some time.

Suggested Fix

First we need to get a better feel of the impact. When we send out using the ChainPorter, the file based proofs are updated first, then later the DB proofs once proofs are delivered.

Ultimately, I think we need a DB migration here to add the outpoint+script key as a composite 3-tuple primary key, similar to what we've done elsewhere.

Steps to reproduce

Run make itest icase=multi_address on postgres.

Expected behavior

Should pass.

Actual behavior

Fails.

gijswijs commented 4 months ago

It doesn't fail consistently for me. About 50% of the time the test passes.

Roasbeef commented 4 months ago

It doesn't fail consistently for me. About 50% of the time the test passes.

Weird...

guggero commented 4 months ago

Current suspicion is that the CTE (WITH clause) somehow behaves differently when it comes to transaction serialization.