lightninglabs / taproot-assets

A layer 1 daemon, for the Taproot Assets Protocol specification, written in Go (golang)
MIT License
464 stars 111 forks source link

[bug]: DB TX retries exceeded during global universe sync #675

Open jharveyb opened 12 months ago

jharveyb commented 12 months ago

Background

Describe your issue here.

When running with tapd v0.3.0 and a global sync of all issuances on, I get numerous log messages about DB TX retries due to serialization errors. In some cases, these operations hit the 10 retry limit and cause sync to fail.

I know that for 0.3.1 onwards global sync is not the default, but IMO this is an underlying issue that I think could appear even without global sync.

Your environment

tapcli getinfo                       # version of `tapd`, `lnd`, and network

tapd v0.3.0, lnd v0.17.0, mainnet

uname -mrsv                          # operating system 

debian testing
16-core CPU, NVMe SSD

bitcoind --version || btcd --version # version of `btcd`, `bitcoind`, or other backend

default neutrino backend with a local bitcoind v0.25 peer

Steps to reproduce

Tell us how to reproduce this issue. Please provide stacktraces and links to code in question.

Perform global issuance sync with the Lightning Labs default universe, with a fresh tapd.

From the log lines preceding these DB issues, I'm pretty sure this is triggered when multiple large (3000+ leaf) asset groups are being synced, and those proofs are being written to disk. The default batch size here is 200 elements, which can be a max size of at least 200 MiB. Since the sync is parallel, we can have multiple goroutines attempting writes in parallel.

The log lines aren't tagged with a goroutine ID or any way to distinguish writers, but based on the attempt counter it seems like 10+ concurrent writers.

Partial logs, generated with cat tapd.log | rg serializ -C20:

2023-11-06 20:19:47.207 [INF] UNIV: Verifying 1 new proofs for insertion into Universe
2023-11-06 20:19:47.207 [DBG] UNIV: UniverseRoot(issuance-766ca5fe4347d5e2218e90387d813c8959e9eaa20e7d777896d1326bad5581f8): Inserting 1 new leaves (1 of 1)
2023-11-06 20:19:47.207 [INF] UNIV: Verifying 1 new proofs for insertion into Universe
2023-11-06 20:19:47.207 [INF] UNIV: Inserting 0 verified group anchor proofs into Universe
2023-11-06 20:19:47.207 [INF] UNIV: Inserting 0 verified group anchor proofs into Universe
2023-11-06 20:19:47.280 [DBG] UNIV: UniverseRoot(issuance-b20b6dc4436dcbe7b3f4b76ce5210fd1e04accae7d9c7ae47cb1ff91b9bbc3dd): Inserting 1 new leaves (1 of 1)
2023-11-06 20:19:47.280 [INF] UNIV: Verifying 1 new proofs for insertion into Universe
2023-11-06 20:19:47.280 [INF] UNIV: Inserting 0 verified group anchor proofs into Universe
2023-11-06 20:19:47.427 [DBG] UNIV: UniverseRoot(issuance-4187c0032ca29783ec3b6441f1f717192d89f787f4219a58e0e7fd53d959785e): Inserting 1 new leaves (1 of 1)
2023-11-06 20:19:47.427 [INF] UNIV: Verifying 1 new proofs for insertion into Universe
2023-11-06 20:19:47.428 [INF] UNIV: Inserting 0 verified group anchor proofs into Universe
2023-11-06 20:19:47.446 [DBG] UNIV: UniverseRoot(issuance-ed45f2cb29254488c7ccf93231f8937564bd8832d40fea2bd619163b357605db): Inserting 1 new leaves (1 of 1)
2023-11-06 20:19:47.446 [INF] UNIV: Verifying 1 new proofs for insertion into Universe
2023-11-06 20:19:47.446 [INF] UNIV: Inserting 0 verified group anchor proofs into Universe
2023-11-06 20:19:48.184 [DBG] UNIV: UniverseRoot(issuance-87225111b8571dcd09e00116238f61c71091485538801b2d6b6457a75119861e): Inserting 200 new leaves (200 of 3061)
2023-11-06 20:19:48.184 [INF] UNIV: Verifying 200 new proofs for insertion into Universe
2023-11-06 20:19:48.219 [INF] UNIV: Inserting 0 verified group anchor proofs into Universe
2023-11-06 20:19:49.958 [DBG] UNIV: UniverseRoot(issuance-997e28940f643b41a3d937bcc842b01c3ba863d66dbd365db015d62a79dd32fc): Inserting 200 new leaves (200 of 3563)
2023-11-06 20:19:49.958 [INF] UNIV: Verifying 200 new proofs for insertion into Universe
2023-11-06 20:19:50.136 [INF] UNIV: Inserting 0 verified group anchor proofs into Universe
2023-11-06 20:19:51.271 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=0, delay=29.907727ms
2023-11-06 20:19:51.337 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=0, delay=36.433343ms
2023-11-06 20:19:51.337 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=0, delay=32.289439ms
2023-11-06 20:19:51.337 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=0, delay=56.356503ms
2023-11-06 20:19:51.337 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=0, delay=37.433905ms
2023-11-06 20:19:51.338 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=0, delay=53.891183ms
2023-11-06 20:19:51.372 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=0, delay=58.653632ms
2023-11-06 20:19:52.227 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=0, delay=45.018695ms
2023-11-06 20:19:52.227 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=0, delay=20.920387ms
2023-11-06 20:19:52.301 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=0, delay=43.033733ms
2023-11-06 20:19:52.399 [DBG] UNIV: Retrieving all keys for Universe: id=issuance-4a0268c6dc8ffe86a441712880eb2089d8564d2bb0026b45a0a7b52a103a9219 (asset_id=0000000000000000000000000000000000000000000000000000000000000000, group_key=bd3f2a315259c9e50e63d5b36daba1c89bf9891707a8b792d34eb4b5bbe177eb, proof_type=issuance)
2023-11-06 20:19:52.399 [TRC] TADB: read leaf keys for issuance-4a0268c6dc8ffe86a441712880eb2089d8564d2bb0026b45a0a7b52a103a9219 (asset_id=0000000000000000000000000000000000000000000000000000000000000000, group_key=bd3f2a315259c9e50e63d5b36daba1c89bf9891707a8b792d34eb4b5bbe177eb, proof_type=issuance) from cache
2023-11-06 20:19:52.399 [TRC] TADB: read leaf keys for issuance-4a0268c6dc8ffe86a441712880eb2089d8564d2bb0026b45a0a7b52a103a9219 (asset_id=0000000000000000000000000000000000000000000000000000000000000000, group_key=bd3f2a315259c9e50e63d5b36daba1c89bf9891707a8b792d34eb4b5bbe177eb, proof_type=issuance) from cache
2023-11-06 20:19:52.401 [DBG] TADB: storing leaf keys for issuance-4a0268c6dc8ffe86a441712880eb2089d8564d2bb0026b45a0a7b52a103a9219 (asset_id=0000000000000000000000000000000000000000000000000000000000000000, group_key=bd3f2a315259c9e50e63d5b36daba1c89bf9891707a8b792d34eb4b5bbe177eb, proof_type=issuance) in cache
2023-11-06 20:19:52.408 [INF] UNIV: UniverseRoot(issuance-4a0268c6dc8ffe86a441712880eb2089d8564d2bb0026b45a0a7b52a103a9219): diff_size=7009
2023-11-06 20:19:52.447 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=0, delay=21.372471ms
2023-11-06 20:19:52.469 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=0, delay=56.932272ms
2023-11-06 20:19:52.477 [TRC] UNIV: UniverseRoot(issuance-4a0268c6dc8ffe86a441712880eb2089d8564d2bb0026b45a0a7b52a103a9219): diff_size=7009, diff=([]universe.LeafKey) (len=7009 cap=7009) {
 (universe.LeafKey) {
  OutPoint: (wire.OutPoint) 46882e61657947e8b8ea9873032cb49c2752b90a146503bb66708bd5482912e0:0,
  ScriptKey: (*asset.ScriptKey)(0xc0fc4e7450)({
   PubKey: (*secp256k1.PublicKey)(0xc0fce061e0)({
    x: (secp256k1.FieldVal) d0ce99ab6b43d36582d203fa756c0dfd5e465bfd929417894e1df3ff297f993f,
    y: (secp256k1.FieldVal) 8655261b9d023c13c74963f4335201e3e223921bd06602b5dcb15790e83766fe
   }),
   TweakedScriptKey: (*asset.TweakedScriptKey)(<nil>)
  })
 },
 (universe.LeafKey) {
  OutPoint: (wire.OutPoint) 46882e61657947e8b8ea9873032cb49c2752b90a146503bb66708bd5482912e0:0,
  ScriptKey: (*asset.ScriptKey)(0xc0e87e6250)({
   PubKey: (*secp256k1.PublicKey)(0xc0e87e51d0)({
    x: (secp256k1.FieldVal) 5e946023dfb19a19a6093d3616542fafa1ce94416eaf1e9e73893d5f14219b5f,
    y: (secp256k1.FieldVal) c0da7c834b376b45b4312a3c477bc29daf7d335db29ddea20496f568376a7b4e
   }),
   TweakedScriptKey: (*asset.TweakedScriptKey)(<nil>)
  })
--
  ScriptKey: (*asset.ScriptKey)(0xc0f4e6b440)({
   PubKey: (*secp256k1.PublicKey)(0xc0f4e77d10)({
    x: (secp256k1.FieldVal) 12b3798c5359e6f5cb0363c2414ace6b45a98a973c0f08d075e5234297558f63,
    y: (secp256k1.FieldVal) 7212665e5321059577bb3267a6d10a6f39108eb756a40e4170acac9de61cd27e
   }),
   TweakedScriptKey: (*asset.TweakedScriptKey)(<nil>)
  })
 },
 (universe.LeafKey) {
  OutPoint: (wire.OutPoint) 46882e61657947e8b8ea9873032cb49c2752b90a146503bb66708bd5482912e0:0,
  ScriptKey: (*asset.ScriptKey)(0xc0da3b7450)({
   PubKey: (*secp256k1.PublicKey)(0xc0da6d9b80)({
    x: (secp256k1.FieldVal) 105f982871162c42acab71846ced75ea9b09d16c9244593a83b98b0e842ff64e,
    y: (secp256k1.FieldVal) 2b491834718381bfd0c5f6221d808cdb50b1c8320a0bac896010166fcb162aa0
   }),
   TweakedScriptKey: (*asset.TweakedScriptKey)(<nil>)
  })
 }
}

2023-11-06 20:19:53.239 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=0, delay=34.58017ms
2023-11-06 20:19:55.160 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=0, delay=55.545068ms
2023-11-06 20:19:56.321 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=1, delay=44.909096ms
2023-11-06 20:19:56.391 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=1, delay=105.512892ms
2023-11-06 20:19:56.397 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=1, delay=69.279688ms
2023-11-06 20:19:56.397 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=1, delay=56.755296ms
2023-11-06 20:19:56.417 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=1, delay=88.2829ms
2023-11-06 20:19:56.417 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=1, delay=117.399398ms
2023-11-06 20:19:56.448 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=1, delay=107.067598ms
2023-11-06 20:19:57.270 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=1, delay=106.008438ms
2023-11-06 20:19:57.297 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=1, delay=118.682018ms
2023-11-06 20:19:57.370 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=1, delay=81.688782ms
2023-11-06 20:19:57.487 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=1, delay=99.339766ms
2023-11-06 20:19:57.549 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=1, delay=113.995154ms
2023-11-06 20:19:58.301 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=1, delay=68.57312ms
2023-11-06 20:20:00.238 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=1, delay=47.548954ms
2023-11-06 20:20:01.387 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=2, delay=179.370136ms
2023-11-06 20:20:01.474 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=2, delay=156.946432ms
2023-11-06 20:20:01.490 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=2, delay=232.77266ms
2023-11-06 20:20:01.519 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=2, delay=166.662352ms
2023-11-06 20:20:01.526 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=2, delay=183.145452ms
2023-11-06 20:20:01.552 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=2, delay=100.413336ms
2023-11-06 20:20:01.575 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=2, delay=212.815388ms
2023-11-06 20:20:02.400 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=2, delay=167.879132ms
2023-11-06 20:20:02.437 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=2, delay=114.905652ms
2023-11-06 20:20:02.476 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=2, delay=123.79096ms
2023-11-06 20:20:02.614 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=2, delay=179.375468ms
2023-11-06 20:20:02.687 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=2, delay=219.525108ms
2023-11-06 20:20:03.388 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=2, delay=235.781132ms
2023-11-06 20:20:05.310 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=2, delay=218.823388ms
2023-11-06 20:20:06.590 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=3, delay=356.327024ms
2023-11-06 20:20:06.651 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=3, delay=230.183ms
2023-11-06 20:20:06.672 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=3, delay=406.828168ms
2023-11-06 20:20:06.708 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=3, delay=213.49792ms
2023-11-06 20:20:06.732 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=3, delay=358.046248ms
2023-11-06 20:20:06.744 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=3, delay=240.036744ms
2023-11-06 20:20:06.810 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=3, delay=279.398936ms
2023-11-06 20:20:07.575 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=3, delay=333.480688ms
2023-11-06 20:20:07.593 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=3, delay=317.983704ms
2023-11-06 20:20:07.621 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=3, delay=242.748576ms
2023-11-06 20:20:07.812 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=3, delay=430.281808ms
2023-11-06 20:20:07.928 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=3, delay=182.745536ms
2023-11-06 20:20:08.642 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=3, delay=221.837296ms
2023-11-06 20:20:10.548 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=3, delay=194.558296ms
2023-11-06 20:20:11.901 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=4, delay=757.435872ms
2023-11-06 20:20:11.943 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=4, delay=485.486208ms
2023-11-06 20:20:11.965 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=4, delay=381.838208ms
2023-11-06 20:20:12.001 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=4, delay=811.882912ms
2023-11-06 20:20:12.101 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=4, delay=321.2524ms
2023-11-06 20:20:12.110 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=4, delay=632.23112ms
2023-11-06 20:20:12.110 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=4, delay=791.195968ms
2023-11-06 20:20:12.887 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=4, delay=867.200096ms
2023-11-06 20:20:12.930 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=4, delay=591.273376ms
2023-11-06 20:20:12.930 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=4, delay=936.084ms
2023-11-06 20:20:13.132 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=4, delay=812.571488ms
2023-11-06 20:20:13.263 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=4, delay=770.6852ms
2023-11-06 20:20:13.882 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=4, delay=476.062464ms
2023-11-06 20:20:15.765 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=4, delay=502.599744ms
2023-11-06 20:20:17.370 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=5, delay=801.898112ms
2023-11-06 20:20:17.444 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=5, delay=1.359726144s
2023-11-06 20:20:17.447 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=5, delay=1.440657024s
2023-11-06 20:20:17.678 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=5, delay=1.531987936s
2023-11-06 20:20:17.763 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=5, delay=1.520366688s
2023-11-06 20:20:17.831 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=5, delay=724.990592ms
2023-11-06 20:20:17.925 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=5, delay=1.034333952s
2023-11-06 20:20:18.543 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=5, delay=1.440540992s
2023-11-06 20:20:18.775 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=5, delay=973.886432ms
2023-11-06 20:20:18.885 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=5, delay=1.7360704s
2023-11-06 20:20:18.968 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=5, delay=1.297001632s
2023-11-06 20:20:19.056 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=5, delay=794.894272ms
2023-11-06 20:20:19.381 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=5, delay=1.46788304s
2023-11-06 20:20:21.286 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=5, delay=947.981536ms
2023-11-06 20:20:23.190 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=6, delay=2.68142144s
2023-11-06 20:20:23.577 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=6, delay=3s
2023-11-06 20:20:23.824 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=6, delay=1.721666304s
2023-11-06 20:20:23.908 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=6, delay=3s
2023-11-06 20:20:23.978 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=6, delay=3s
2023-11-06 20:20:24.230 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=6, delay=1.602506752s
2023-11-06 20:20:24.299 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=6, delay=2.175833536s
2023-11-06 20:20:24.771 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=6, delay=2.308415936s
2023-11-06 20:20:24.871 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=6, delay=1.74248992s
2023-11-06 20:20:25.002 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=6, delay=3s
2023-11-06 20:20:25.284 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=6, delay=2.255631872s
2023-11-06 20:20:25.638 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=6, delay=2.45642432s
2023-11-06 20:20:25.867 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=6, delay=3s
2023-11-06 20:20:27.254 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=6, delay=1.605380928s
2023-11-06 20:20:30.561 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=7, delay=3s
2023-11-06 20:20:30.850 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=7, delay=3s
2023-11-06 20:20:30.892 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=7, delay=3s
2023-11-06 20:20:31.495 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=7, delay=3s
2023-11-06 20:20:31.598 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=7, delay=3s
2023-11-06 20:20:31.635 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=7, delay=3s
2023-11-06 20:20:31.926 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=7, delay=3s
2023-11-06 20:20:31.997 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=7, delay=3s
2023-11-06 20:20:32.100 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=7, delay=3s
2023-11-06 20:20:32.556 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=7, delay=3s
2023-11-06 20:20:33.020 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=7, delay=2.839809408s
2023-11-06 20:20:33.114 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=7, delay=3s
2023-11-06 20:20:33.879 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=7, delay=3s
2023-11-06 20:20:33.885 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=7, delay=3s
2023-11-06 20:20:34.500 [DBG] UNIV: UniverseRoot(issuance-4a0268c6dc8ffe86a441712880eb2089d8564d2bb0026b45a0a7b52a103a9219): Inserting 200 new leaves (200 of 7009)
2023-11-06 20:20:34.500 [INF] UNIV: Verifying 200 new proofs for insertion into Universe
2023-11-06 20:20:34.535 [INF] UNIV: Inserting 0 verified group anchor proofs into Universe
2023-11-06 20:20:38.582 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=8, delay=3s
2023-11-06 20:20:38.876 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=8, delay=3s
2023-11-06 20:20:38.912 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=8, delay=3s
2023-11-06 20:20:39.516 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=8, delay=3s
2023-11-06 20:20:39.554 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=0, delay=28.870458ms
2023-11-06 20:20:39.620 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=8, delay=3s
2023-11-06 20:20:39.654 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=8, delay=3s
2023-11-06 20:20:39.954 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=8, delay=3s
2023-11-06 20:20:40.019 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=8, delay=3s
2023-11-06 20:20:40.120 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=8, delay=3s
2023-11-06 20:20:40.581 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=8, delay=3s
2023-11-06 20:20:40.882 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=8, delay=3s
2023-11-06 20:20:41.130 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=8, delay=3s
2023-11-06 20:20:41.907 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=8, delay=3s
2023-11-06 20:20:41.907 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=8, delay=3s
2023-11-06 20:20:44.603 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=1, delay=80.4357ms
2023-11-06 20:20:46.604 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=9, delay=3s
2023-11-06 20:20:46.900 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=9, delay=3s
2023-11-06 20:20:46.940 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=9, delay=3s
2023-11-06 20:20:47.543 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=9, delay=3s
2023-11-06 20:20:47.643 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=9, delay=3s
2023-11-06 20:20:47.673 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=9, delay=3s
2023-11-06 20:20:47.974 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=9, delay=3s
2023-11-06 20:20:48.039 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=9, delay=3s
2023-11-06 20:20:48.145 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=9, delay=3s
2023-11-06 20:20:48.605 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=9, delay=3s
2023-11-06 20:20:48.907 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=9, delay=3s
2023-11-06 20:20:49.153 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=9, delay=3s
2023-11-06 20:20:49.605 [DBG] UNIV: Looking up root node for base Universe (universe.Identifier) issuance-5066057bb5b507aff4a5710c19b9394ebff9b039305ba7cc5489f3c769667f7f

2023-11-06 20:20:49.605 [DBG] UNIV: Looking up root node for base Universe (universe.Identifier) issuance-9a7af6a2f6783877d11a39ff3dfeaa04f3d6bc965dc9ad04d4e59cbc09af9276

2023-11-06 20:20:49.605 [DBG] UNIV: Looking up root node for base Universe (universe.Identifier) issuance-286f01a99855d7bc5dd4d875bce185d6acb1e9ba3acb9eec0085ce13620b4086

2023-11-06 20:20:49.605 [DBG] UNIV: Looking up root node for base Universe (universe.Identifier) issuance-183247544f121f9d834b8be25bb02bedd62dbb1b43f541b447f6838dc1043719

2023-11-06 20:20:49.605 [DBG] UNIV: Looking up root node for base Universe (universe.Identifier) issuance-f0a17e8c81f27097a95a7ea591072abba67260d3d0c5631876915b9666c40630

2023-11-06 20:20:49.605 [DBG] UNIV: Looking up root node for base Universe (universe.Identifier) issuance-1142a81e275bbfe7e43c3b271dff190b9eb3760f4564a35d9b772186230c3747

2023-11-06 20:20:49.605 [DBG] UNIV: Looking up root node for base Universe (universe.Identifier) issuance-6ff765342e57468496594cf8f5220c400ef2a0b6633b8b09728dec1d93389201

2023-11-06 20:20:49.606 [DBG] UNIV: Looking up root node for base Universe (universe.Identifier) issuance-8ecbab5714b3559cf72a63d4ba61767aeff64e3587a8fa8350af9d1200f699cf

2023-11-06 20:20:49.606 [DBG] UNIV: Looking up root node for base Universe (universe.Identifier) issuance-42557cc5f390f5029f4adf586406117097b02de513c8041adfc6bced3b17a8cf

2023-11-06 20:20:49.606 [DBG] UNIV: Looking up root node for base Universe (universe.Identifier) issuance-b38246163381af3f08091d7fa9b59a28d96b3c3ed05ddcded40000f6558b3736

--

2023-11-06 20:20:49.656 [DBG] UNIV: Looking up root node for base Universe (universe.Identifier) issuance-85eedf70ce32b038ce16a0a8df202d68445fe27d81465949618a582a963be320

2023-11-06 20:20:49.656 [DBG] UNIV: Looking up root node for base Universe (universe.Identifier) issuance-7e3ccdf93124041ea6ee3a0d57da789ae90668942d95edf3d1d1a57824eae43a

2023-11-06 20:20:49.656 [DBG] UNIV: Looking up root node for base Universe (universe.Identifier) issuance-7651bbb2ae97822d1de2669371cc03684713945f0d0211404217f261511a91f6

2023-11-06 20:20:49.656 [DBG] UNIV: Looking up root node for base Universe (universe.Identifier) issuance-09ee8de973e2a6a02b39f46be0b388100ce791115a4076847a679a6992de977b

2023-11-06 20:20:49.656 [DBG] UNIV: Looking up root node for base Universe (universe.Identifier) issuance-766e6b5243401c08285749ec625672ffb1215168e8c626ee3c8bee9933899437

2023-11-06 20:20:49.656 [DBG] UNIV: Looking up root node for base Universe (universe.Identifier) issuance-cc32551a1c20071ccaa76155b1718cb6d991a116f05b76c279a2476fc9ecb8f7

2023-11-06 20:20:49.656 [DBG] UNIV: Looking up root node for base Universe (universe.Identifier) issuance-f0896a50b538937618cdfeb9836d638b4d404c3df43af2d5d93a524582e93ac0

2023-11-06 20:20:49.656 [INF] TADB: db cache universe_roots: 10112 hits, 678588 misses, 1.47% hit ratio
2023-11-06 20:20:49.656 [DBG] UNIV: Looking up root node for base Universe (universe.Identifier) issuance-351cb88f7a7595d152bcfb5babe0424bb0537162d97f9f2b01d2e87252d1f6c0

2023-11-06 20:20:49.656 [DBG] UNIV: Looking up root node for base Universe (universe.Identifier) issuance-3cf95e407d5ec74a0a8635a6e385cdcf1e3ca921a209cdcc292dec969c698a7a

2023-11-06 20:20:49.709 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=2, delay=86.1926ms
2023-11-06 20:20:49.926 [TRC] TADB: Retrying transaction due to tx serialization error, attempt_number=9, delay=3s
2023-11-06 20:20:52.154 [WRN] UNIV: encountered an error whilst syncing with server=(universe.ServerAddr) {
 ID: (int64) 1,
 addrStr: (string) (len=32) "universe.lightning.finance:10029",
 addr: (net.Addr) <nil>
}
: %!w(*fmt.wrapError=&{unable to register proofs: unable to register proofs: unable to register new group anchor issuance proofs: db tx retries exceeded 0xc0fa49a480})
2023-11-06 20:21:03.368 [INF] TAPD: Received shutdown request.
2023-11-06 20:21:03.368 [INF] TAPD: Shutting down...
2023-11-06 20:21:03.368 [INF] TAPD: Gracefully shutting down.
2023-11-06 20:21:03.368 [INF] SRVR: Received SIGINT (Ctrl+C). Shutting down...
2023-11-06 20:21:03.368 [INF] RPCS: Stopping RPC Server
2023-11-06 20:21:03.369 [INF] SRVR: Shutdown complete

Possible fixes

I expect most end users with smaller wallets (not universe operators, companies running larger nodes) with run with the defaults and on an SQLite backend.

We could decrease the default DB write batch size, or the # of parallel goroutines during sync.

Alternatively, we could add another synchronization layer between concurrent DB writers like a counting semaphore to rate-limit DB access but still have many goroutines for the other work performed during sync.

freerko commented 3 months ago

Hi!

I've set up tapd 0.4.1-alpha with a local bitcoind 27.1 and try to sync the global default universe. I use the default SQLite and after about a week of syncing my tapd.db is only about 150 MB and I've only managed to sync a few assets, I think the stats didn't change since several days:

root@server:~# tapcli universe stats
{
    "num_total_assets":  "6440",
    "num_total_groups":  "37",
    "num_total_syncs":  "0",
    "num_total_proofs":  "6930"
}

I get the "db tx retries exceeded" error as well:

2024-08-15 13:17:09.938 [DBG] UNIV: Looking up root node for base Universe (universe.Identifier) issuance-5cef8faf4ce2d667bb31ada76ff6d0a3c1d7944cf36a39f4af4d4e55084257a7

[... about 895 similar lines in the same second... ]

2024-08-15 13:17:09.941 [DBG] UNIV: Looking up root node for base Universe (universe.Identifier) issuance-8ab9ad693975feaabb386772753217a8d0122570610ca8a6aeec8ca900c745a0

2024-08-15 13:17:09.942 [WRN] UNIV: encountered an error whilst syncing with server=(universe.ServerAddr) {
 ID: (int64) 1,
 addrStr: (string) (len=32) "universe.lightning.finance:10029",
 addr: (net.Addr) <nil>
}
: unable to register proofs: unable to register proofs: unable to register new group anchor issuance proofs: db tx retries exceeded

There is no high I/O on the local disk nor high CPU load. Is a global universe sync only supported/working with a Postgres DB? I've used SQLite for other projects with several gigabyte databases without issues. I have this issue on both testnet3 and mainnet.

guggero commented 3 months ago

Hi @freerko. The sync should work for SQLite users. Not sure why you run into this particular issue consistently. I assume you're running with the --universe.sync-all-assets flag (or universe.sync-all-assets=true config option)?

If you're able (and comfortable) building from source, could you try increasing this number to 50 and seeing if that changes things?

freerko commented 2 months ago

Thanks @guggero! I didn't use the universe.sync-all-assets=true config option yet, but set it now since it sounds like what I want :) I've changed the DefaultNumTxRetries to 50 and compiled tapd and let it run on the same db as before. Now it adds new assets, I see the numbers from "tapcli universe stats" increasing. But still after running the universe sync command I run into the same error, after half an hour or so:

root@server:~# tapcli universe sync --universe_host universe.lightning.finance [tapcli] rpc error: code = Unknown desc = unable to sync universe: unable to register proofs: unable to register proofs: unable to register new group anchor issuance proofs: db tx retries exceeded

But it continues to sync in the background, so I'll let it run over the next days and will report what happens.

On the other node, on testnet3, I got this error after setting the retries to 50, every 10 minutes or so with different hashes and the numbers in the stats command didn't increase:

2024-08-16 09:44:02.763 [DBG] UNIV: Looking up root node for base Universe (universe.Identifier) issuance-bda186d39d9707ec78a639659de8fe067d0163a42a0850f75c224cb9dbe00ae3

2024-08-16 09:44:02.763 [DBG] UNIV: Looking up root node for base Universe (universe.Identifier) issuance-4d248e3497ebfc47f01eee146aded9b1d321b5a33ebb3f8e4f5a145b8cc7fb43

2024-08-16 09:44:02.763 [DBG] UNIV: Looking up root node for base Universe (universe.Identifier) issuance-b6e4311901cadc3d11286d231a5e4c4585041934bdfd3261d64a37937c1330de

2024-08-16 09:44:02.764 [WRN] UNIV: encountered an error whilst syncing with server=(universe.ServerAddr) { ID: (int64) 1, addrStr: (string) (len=40) "testnet.universe.lightning.finance:10029", addr: (net.Addr) } : unable to register proofs: unable to register proofs: unable to verify issuance proofs: unable to fetch previous asset snapshot: unable to fetch previous proof: no universe proof found, id=(universe.Identifier) issuance-a823a0096bd93c6683fb0189272521a4f7a2b57baec1558c60aa22a3afe446c5 , leaf_key=(universe.LeafKey) { OutPoint: (wire.OutPoint) 2dcc508b05efd21f594f1632a63a68d81d157e76fb22f64c9a15a61f2356148b:0, ScriptKey: (asset.ScriptKey)(0xc001004de0)({ PubKey: (secp256k1.PublicKey)(0xc00101e320)({ x: (secp256k1.FieldVal) 9c75753189074f4fa8cd322c9ca409d8d472a5e7ae0f7dd466226c93cc79a5c6, y: (secp256k1.FieldVal) d4501219d06825c66444e82b367d06eb8f84a61a4b97e1a1f715cfa4fa320258 }), TweakedScriptKey: (*asset.TweakedScriptKey)() }) } , new_script_key=02540e4eff752cbd7cd9e963e69bea5ba45c73909928dd171ff98e1b4adc03fc24

So I've removed the data folder with the db and starting from scratch on testnet3, I got some assets synced but then I got another error:

2024-08-16 10:57:37.117 [DBG] UNIV: Looking up root node for base Universe (universe.Identifier) issuance-dfa1a1bf7bfee242ba635bcd966393d8df48beccdc4be9fecd386728b563edcd

2024-08-16 10:57:37.117 [DBG] UNIV: Looking up root node for base Universe (universe.Identifier) issuance-3a8e10323168e7ebf19fbc59bcb607071f7cb4f195f041522e9ec576f49624ee

2024-08-16 10:57:37.165 [WRN] UNIV: encountered an error whilst syncing with server=(universe.ServerAddr) { ID: (int64) 1, addrStr: (string) (len=40) "testnet.universe.lightning.finance:10029", addr: (net.Addr) } : unable to register proofs: unable to register proofs: unable to verify issuance proofs: unable to verify proof: failed to validate proof block header: block hash and block height mismatch; (height: 2867248, hashAtHeight: 00000000d1e5841c9873c053fafc6972da94549198268d1b9d73ddfa184cb6bf, expectedHash: 0000000075f28b79d00152765fcf6feb4d28b4270f8000fd396b675797826d3a) 2024-08-16 10:57:47.284 [INF] TADB: Refreshing stats cache, duration=30m0s 2024-08-16 10:57:47.310 [DBG] TADB: Refreshed stats cache, interval=30m0s, took=26.134612ms

I get the error with the same hashes every 10 minutes now and the numbers in the stats command are not increasing anymore. The hashAtHeight seems to be fine for that block: https://mempool.space/testnet/block/00000000d1e5841c9873c053fafc6972da94549198268d1b9d73ddfa184cb6bf I don't know where that expectedHash comes from.

guggero commented 2 months ago

block hash and block height mismatch; (height: 2867248, hashAtHeight: 00000000d1e5841c9873c053fafc6972da94549198268d1b9d73ddfa184cb6bf, expectedHash: 0000000075f28b79d00152765fcf6feb4d28b4270f8000fd396b675797826d3a)

I think someone uploaded a proof referencing a block that was later re-organized (testnet3 is a bit of a mess when it comes to re-orgs). We increased the number of blocks we watch for re-orgs on testnet since. But it seems like we probably have to remove that proof to fix the sync. I'll look into it.

guggero commented 2 months ago

@freerko I couldn't reproduce the block height mismatch on testnet... Do you have more information from the log what asset gave the problem?

freerko commented 2 months ago

hmm, how can I see in the logs which asset gave the problem? Does the group key in the error of the sync command helps?

I try to sync everything (with universe.sync-all-assets=true) like this and get this error:

root@server:~# tapcli universe sync --universe_host testnet.universe.lightning.finance [tapcli] rpc error: code = Unknown desc = unable to sync universe: unable to register proofs: unable to register proofs: unable to verify issuance proofs: unable to verify proof: group key not known: 02d8394c926be907c30af34e7b998afd09cb2638c69950a604d5a54711142bc97d: group verifier: no matching asset group: sql: no rows in result set: group key not known

In the tapd.log it looks like this, I get always different "Looking up for root node for base Universe..." lines before the sync error.

2024-08-26 12:04:38.276 [DBG] UNIV: Looking up root node for base Universe (universe.Identifier) issuance-012b09f5ce832145b2cc4302bca5d23f6beb2b7125826120621561f94c59957f

2024-08-26 12:04:38.276 [DBG] UNIV: Looking up root node for base Universe (universe.Identifier) issuance-a7c4183f78cf3c09e7d85d49844c5dec36610c4feae02524368fb1d17da73d7a

2024-08-26 12:04:38.276 [DBG] UNIV: Looking up root node for base Universe (universe.Identifier) issuance-6362b41fd0201dabe36cd02c5c02d7b8d6b76062d8b6a7d2f1ae0fd7c4cc493d

2024-08-26 12:04:38.276 [DBG] UNIV: Looking up root node for base Universe (universe.Identifier) issuance-5a1afa5667055ea76951cc913065606c8f4d539d9d35176a27559e31822f509e

2024-08-26 12:04:38.276 [DBG] UNIV: Looking up root node for base Universe (universe.Identifier) issuance-d421c41e5cccc7a46853a67a77f9bfbff0e46b636be45080b70d6228430c2675

2024-08-26 12:04:38.276 [DBG] UNIV: Looking up root node for base Universe (universe.Identifier) transfer-4d248e3497ebfc47f01eee146aded9b1d321b5a33ebb3f8e4f5a145b8cc7fb43

2024-08-26 12:04:38.276 [DBG] UNIV: Looking up root node for base Universe (universe.Identifier) issuance-8e2feef6d0ef9f0b544140b8f74ceb327a84b0442a1d2e2a15bef8686755efd3

2024-08-26 12:04:38.310 [WRN] UNIV: encountered an error whilst syncing with server=(universe.ServerAddr) { ID: (int64) 1, addrStr: (string) (len=40) "testnet.universe.lightning.finance:10029", addr: (net.Addr) } : unable to register proofs: unable to register proofs: unable to verify issuance proofs: unable to verify proof: failed to validate proof block header: block hash and block height mismatch; (height: 2867248, hashAtHeight: 00000000d1e5841c9873c053fafc6972da94549198268d1b9d73ddfa184cb6bf, expectedHash: 0000000075f28b79d00152765fcf6feb4d28b4270f8000fd396b675797826d3a) 2024-08-26 12:04:40.658 [INF] GRDN: New block at height 2875486

My local stats look similar to the ones from the Lightning labs universe, at least it is not missing a lot:

root@server:# tapcli universe stats { "num_total_assets": "1240", "num_total_groups": "189", "num_total_syncs": "0", "num_total_proofs": "1703" } root@server:# curl -s https://testnet.universe.lightning.finance/v1/taproot-assets/universe/stats | jq { "num_total_assets": "1244", "num_total_groups": "190", "num_total_syncs": "2538359", "num_total_proofs": "6081" } root@server:#

freerko commented 2 months ago

Now looking at my Mainnet node again it seems like the sync is stuck, the stats havn't changed for a few days:

root@server:# tapcli universe stats { "num_total_assets": "16189", "num_total_groups": "65", "num_total_syncs": "0", "num_total_proofs": "21606" }

The data directory looks weird as well, apparently tapd didn't write to the tapd.db for more than a week and to the big "tapd.db-wal" file two days ago:

root@server:# ls -alh /root/.tapd/data/mainnet/ total 2.9G drwx------ 3 root root 4.0K Aug 19 14:49 . drwx------ 4 root root 4.0K Aug 7 13:48 .. -rw-r--r-- 1 root root 248 Aug 7 13:35 admin.macaroon drwxr-x--- 2 root root 4.0K Aug 7 13:35 proofs -rw-r--r-- 1 root root 200M Aug 19 15:14 tapd.db -rw-r--r-- 1 root root 12K Aug 7 13:35 tapd.db.1723030556431609440.backup -rw-r--r-- 1 root root 148M Aug 16 11:08 tapd.db.1723799284664422550.backup -rw-r--r-- 1 root root 5.1M Aug 25 20:51 tapd.db-shm -rw-r--r-- 1 root root 2.6G Aug 25 20:51 tapd.db-wal root@server:~# date Tue Aug 27 03:39:14 PM CEST 2024

But when I look at the "db cache universe_roots hits/misses hit ratio" in the tapd.log I see a very slowly increasing hit ratio of currently 2,46% (yesterday it was about 1,6%). The node syncs since about two weeks now from the default universe. In the tapd.log I still see some new "Verifying 200 new proofs for insertion into Universe" lines, but the stats are not increasing. The system is a VM with 8 vCores of AMD EPYC 7282 and 24GB RAM and SSD disk running several other services as well, but in my monitoring I don't see the system limited by neither CPU, RAM nor network or disk I/O.

Is this to be expected? Does it take months to sync? Or is my system to slow? Should I restart tapd regularly?

Logs look ok for me:

Logs > 2024-08-27 15:26:58.844 [DBG] UNIV: Root for issuance-3d4dadc581d5f017f2e06b2ed9ee064369a78aca8603fb128b2694d768727d64 matches, no sync needed > 2024-08-27 15:26:58.844 [DBG] UNIV: Looking up root node for base Universe (universe.Identifier) issuance-bc3d3df62ba1857418c8575cfb77761d4118465536a7e59a7459f873e8506953 > > 2024-08-27 15:26:58.845 [INF] UNIV: UniverseRoot(issuance-9759e54aa14c9a8f0327a18e602d04fce49057b2c782f0a6aa56377c87c8c3f2) diverges, performing leaf diff... > 2024-08-27 15:26:58.845 [INF] UNIV: UniverseRoot(issuance-b701d8748f47cd49c3167c68e6f26b20bb6ccd527443fbbd947a1c17356ea064) diverges, performing leaf diff... > 2024-08-27 15:26:58.846 [INF] UNIV: UniverseRoot(issuance-f804b9fd46d82cf14cd1f4ad21ad4c77c8a242609cdf862499d10e350a39a87b) diverges, performing leaf diff... > 2024-08-27 15:26:58.846 [INF] UNIV: UniverseRoot(issuance-bc3d3df62ba1857418c8575cfb77761d4118465536a7e59a7459f873e8506953) diverges, performing leaf diff... > 2024-08-27 15:26:59.146 [DBG] UNIV: Retrieving all keys for Universe: id=issuance-87225111b8571dcd09e00116238f61c71091485538801b2d6b6457a75119861e (asset_id=0000000000000000000000000000000000000000000000000000000000000000, group_key=76ea2858 > a4e28240c046083cc4246bba2969880371e550733bde3393076660b6, proof_type=issuance) > 2024-08-27 15:26:59.146 [DBG] UNIV: Retrieving all keys for Universe: id=issuance-87225111b8571dcd09e00116238f61c71091485538801b2d6b6457a75119861e (asset_id=0000000000000000000000000000000000000000000000000000000000000000, group_key=76ea2858 > a4e28240c046083cc4246bba2969880371e550733bde3393076660b6, proof_type=issuance) > 2024-08-27 15:26:59.146 [DBG] UNIV: Retrieving all keys for Universe: id=issuance-87225111b8571dcd09e00116238f61c71091485538801b2d6b6457a75119861e (asset_id=0000000000000000000000000000000000000000000000000000000000000000, group_key=76ea2858 > a4e28240c046083cc4246bba2969880371e550733bde3393076660b6, proof_type=issuance) > 2024-08-27 15:26:59.146 [DBG] UNIV: Retrieving all keys for Universe: id=issuance-87225111b8571dcd09e00116238f61c71091485538801b2d6b6457a75119861e (asset_id=0000000000000000000000000000000000000000000000000000000000000000, group_key=76ea2858 > a4e28240c046083cc4246bba2969880371e550733bde3393076660b6, proof_type=issuance) > 2024-08-27 15:26:59.156 [DBG] TADB: storing leaf keys for issuance-87225111b8571dcd09e00116238f61c71091485538801b2d6b6457a75119861e (asset_id=0000000000000000000000000000000000000000000000000000000000000000, group_key=76ea2858a4e28240c046083 > cc4246bba2969880371e550733bde3393076660b6, proof_type=issuance) in cache > 2024-08-27 15:26:59.160 [INF] UNIV: UniverseRoot(issuance-87225111b8571dcd09e00116238f61c71091485538801b2d6b6457a75119861e): diff_size=4452 > 2024-08-27 15:26:59.546 [DBG] UNIV: Retrieving all keys for Universe: id=issuance-b701d8748f47cd49c3167c68e6f26b20bb6ccd527443fbbd947a1c17356ea064 (asset_id=b701d8748f47cd49c3167c68e6f26b20bb6ccd527443fbbd947a1c17356ea064, group_key=, proof_type=issuance) > 2024-08-27 15:26:59.547 [DBG] UNIV: Retrieving all keys for Universe: id=issuance-612deb2d2cb1c857044778ac223ff50e3f3798203a52a11a64a936ae0cdd526f (asset_id=612deb2d2cb1c857044778ac223ff50e3f3798203a52a11a64a936ae0cdd526f, group_key=, proof_type=issuance) > 2024-08-27 15:26:59.548 [DBG] UNIV: Retrieving all keys for Universe: id=issuance-9759e54aa14c9a8f0327a18e602d04fce49057b2c782f0a6aa56377c87c8c3f2 (asset_id=9759e54aa14c9a8f0327a18e602d04fce49057b2c782f0a6aa56377c87c8c3f2, group_key=, proof_type=issuance) > 2024-08-27 15:26:59.548 [DBG] UNIV: Retrieving all keys for Universe: id=issuance-f804b9fd46d82cf14cd1f4ad21ad4c77c8a242609cdf862499d10e350a39a87b (asset_id=f804b9fd46d82cf14cd1f4ad21ad4c77c8a242609cdf862499d10e350a39a87b, group_key=, proof_type=issuance) > 2024-08-27 15:26:59.548 [DBG] UNIV: Retrieving all keys for Universe: id=issuance-bc3d3df62ba1857418c8575cfb77761d4118465536a7e59a7459f873e8506953 (asset_id=bc3d3df62ba1857418c8575cfb77761d4118465536a7e59a7459f873e8506953, group_key=, proof_type=issuance) > 2024-08-27 15:26:59.546 [DBG] UNIV: Retrieving all keys for Universe: id=issuance-93fc3a4e5a8f2189bbb020a4c264da20bc158a925be9ae5bfef99951886f32a0 (asset_id=93fc3a4e5a8f2189bbb020a4c264da20bc158a925be9ae5bfef99951886f32a0, group_key=, proof_type=issuance) > 2024-08-27 15:26:59.549 [DBG] TADB: storing leaf keys for issuance-b701d8748f47cd49c3167c68e6f26b20bb6ccd527443fbbd947a1c17356ea064 (asset_id=b701d8748f47cd49c3167c68e6f26b20bb6ccd527443fbbd947a1c17356ea064, group_key=, proof_type=issuance) in cache > 2024-08-27 15:26:59.549 [INF] UNIV: UniverseRoot(issuance-b701d8748f47cd49c3167c68e6f26b20bb6ccd527443fbbd947a1c17356ea064): diff_size=1 > 2024-08-27 15:26:59.550 [DBG] TADB: storing leaf keys for issuance-612deb2d2cb1c857044778ac223ff50e3f3798203a52a11a64a936ae0cdd526f (asset_id=612deb2d2cb1c857044778ac223ff50e3f3798203a52a11a64a936ae0cdd526f, group_key=, proof_type=issuance) in cache > 2024-08-27 15:26:59.550 [INF] UNIV: UniverseRoot(issuance-612deb2d2cb1c857044778ac223ff50e3f3798203a52a11a64a936ae0cdd526f): diff_size=1 > 2024-08-27 15:26:59.552 [DBG] TADB: storing leaf keys for issuance-9759e54aa14c9a8f0327a18e602d04fce49057b2c782f0a6aa56377c87c8c3f2 (asset_id=9759e54aa14c9a8f0327a18e602d04fce49057b2c782f0a6aa56377c87c8c3f2, group_key=, proof_type=issuance) in cache > 2024-08-27 15:26:59.552 [INF] UNIV: UniverseRoot(issuance-9759e54aa14c9a8f0327a18e602d04fce49057b2c782f0a6aa56377c87c8c3f2): diff_size=1 > 2024-08-27 15:26:59.553 [DBG] TADB: storing leaf keys for issuance-f804b9fd46d82cf14cd1f4ad21ad4c77c8a242609cdf862499d10e350a39a87b (asset_id=f804b9fd46d82cf14cd1f4ad21ad4c77c8a242609cdf862499d10e350a39a87b, group_key=, proof_type=issuance) in cache > 2024-08-27 15:26:59.553 [INF] UNIV: UniverseRoot(issuance-f804b9fd46d82cf14cd1f4ad21ad4c77c8a242609cdf862499d10e350a39a87b): diff_size=1 > 2024-08-27 15:26:59.554 [DBG] TADB: storing leaf keys for issuance-bc3d3df62ba1857418c8575cfb77761d4118465536a7e59a7459f873e8506953 (asset_id=bc3d3df62ba1857418c8575cfb77761d4118465536a7e59a7459f873e8506953, group_key=, proof_type=issuance) in cache > 2024-08-27 15:26:59.554 [INF] UNIV: UniverseRoot(issuance-bc3d3df62ba1857418c8575cfb77761d4118465536a7e59a7459f873e8506953): diff_size=1 > 2024-08-27 15:26:59.555 [DBG] TADB: storing leaf keys for issuance-93fc3a4e5a8f2189bbb020a4c264da20bc158a925be9ae5bfef99951886f32a0 (asset_id=93fc3a4e5a8f2189bbb020a4c264da20bc158a925be9ae5bfef99951886f32a0, group_key=, proof_type=issuance) in cache > 2024-08-27 15:26:59.555 [INF] UNIV: UniverseRoot(issuance-93fc3a4e5a8f2189bbb020a4c264da20bc158a925be9ae5bfef99951886f32a0): diff_size=1 > 2024-08-27 15:26:59.950 [DBG] UNIV: UniverseRoot(issuance-9759e54aa14c9a8f0327a18e602d04fce49057b2c782f0a6aa56377c87c8c3f2): Inserting 1 new leaves (1 of 1) > 2024-08-27 15:26:59.950 [INF] UNIV: Verifying 1 new proofs for insertion into Universe > 2024-08-27 15:26:59.950 [DBG] UNIV: UniverseRoot(issuance-b701d8748f47cd49c3167c68e6f26b20bb6ccd527443fbbd947a1c17356ea064): Inserting 1 new leaves (1 of 1) > 2024-08-27 15:26:59.950 [INF] UNIV: Verifying 1 new proofs for insertion into Universe > 2024-08-27 15:26:59.951 [INF] UNIV: Inserting 0 verified group anchor proofs into Universe > 2024-08-27 15:26:59.951 [INF] UNIV: Inserting 0 verified group anchor proofs into Universe > 2024-08-27 15:26:59.951 [DBG] UNIV: UniverseRoot(issuance-612deb2d2cb1c857044778ac223ff50e3f3798203a52a11a64a936ae0cdd526f): Inserting 1 new leaves (1 of 1) > 2024-08-27 15:26:59.951 [INF] UNIV: Verifying 1 new proofs for insertion into Universe > 2024-08-27 15:26:59.951 [INF] UNIV: Inserting 0 verified group anchor proofs into Universe > 2024-08-27 15:26:59.953 [DBG] UNIV: UniverseRoot(issuance-f804b9fd46d82cf14cd1f4ad21ad4c77c8a242609cdf862499d10e350a39a87b): Inserting 1 new leaves (1 of 1) > 2024-08-27 15:26:59.953 [INF] UNIV: Verifying 1 new proofs for insertion into Universe > 2024-08-27 15:26:59.953 [INF] UNIV: Inserting 0 verified group anchor proofs into Universe > 2024-08-27 15:26:59.955 [DBG] UNIV: UniverseRoot(issuance-bc3d3df62ba1857418c8575cfb77761d4118465536a7e59a7459f873e8506953): Inserting 1 new leaves (1 of 1) > 2024-08-27 15:26:59.955 [INF] UNIV: Verifying 1 new proofs for insertion into Universe > 2024-08-27 15:26:59.955 [INF] UNIV: Inserting 0 verified group anchor proofs into Universe > 2024-08-27 15:26:59.955 [DBG] UNIV: UniverseRoot(issuance-93fc3a4e5a8f2189bbb020a4c264da20bc158a925be9ae5bfef99951886f32a0): Inserting 1 new leaves (1 of 1) > 2024-08-27 15:26:59.955 [INF] UNIV: Verifying 1 new proofs for insertion into Universe > 2024-08-27 15:26:59.956 [INF] UNIV: Inserting 0 verified group anchor proofs into Universe > 2024-08-27 15:27:09.389 [DBG] UNIV: UniverseRoot(issuance-87225111b8571dcd09e00116238f61c71091485538801b2d6b6457a75119861e): Inserting 200 new leaves (200 of 4452) > 2024-08-27 15:27:09.389 [INF] UNIV: Verifying 200 new proofs for insertion into Universe > 2024-08-27 15:27:09.408 [INF] UNIV: Inserting 0 verified group anchor proofs into Universe > 2024-08-27 15:27:13.482 [DBG] UNIV: UniverseRoot(issuance-997e28940f643b41a3d937bcc842b01c3ba863d66dbd365db015d62a79dd32fc): Inserting 200 new leaves (200 of 3563) > 2024-08-27 15:27:13.483 [INF] UNIV: Verifying 200 new proofs for insertion into Universe > 2024-08-27 15:27:13.802 [INF] UNIV: Inserting 0 verified group anchor proofs into Universe
jharveyb commented 1 month ago

That is a bit weird, I would expect the sync to make progress after each run at least. For context, if I pull the stats from the mainnet Universe server:

curl -s https://universe.lightning.finance/v1/taproot-assets/universe/stats | jq

{
  "num_total_assets": "108011",
  "num_total_groups": "176",
  "num_total_syncs": "80016341",
  "num_total_proofs": "174907"
}

The system is a VM with 8 vCores of AMD EPYC 7282 and 24GB RAM and SSD disk

That should be totally fine, the sync should not be particularly resource intensive, and I think the total storage used for all proofs is still relatively small also.

IIRC there is some server-side rate limiting that may slow syncing progress if you're trying to sync all assets, but it isn't intended to prevent that.

dstadulis commented 1 month ago

Do all full universe server encounter this issue when initiating?

jharveyb commented 1 month ago

Do all full universe server encounter this issue when initiating?

Working on repro for mainnet now.

freerko commented 1 month ago

With DefaultNumTxRetries set to 999 now it syncs again, albeit very slowly with about 1000 assets per day, which would mean about 90 days left. Lets see :)

jharveyb commented 1 month ago

Can you try the default number of retries, with GOMAXPROCS=1, running this PR or later?

https://github.com/lightninglabs/taproot-assets/pull/1123

We're also checking to see if some traffic rate limiting may be affecting this / syncing a new full universe from scratch.

freerko commented 1 month ago

With the default number of retries and GOMAXPROCS=1 the proofs are increasing but the assets are not.

This was about 20 hours ago:

root@server:~# tapcli universe stats
{
    "num_total_assets": "20842",
    "num_total_groups": "67",
    "num_total_syncs": "0",
    "num_total_proofs": "40238"
}

and this is now:

root@server:~# tapcli universe stats
{
    "num_total_assets": "20842",
    "num_total_groups": "67",
    "num_total_syncs": "0",
    "num_total_proofs": "44889"
}

There are no "DB TX retries exceeded" errors in the log. After the restart tapd inserts 200 new leaves about every hour, beginning with 2024-09-26 14:36:34.803 [DBG] UNIV: UniverseRoot(issuance-87225111b8571dcd09e00116238f61c71091485538801b2d6b6457a75119861e): Inserting 200 new leaves (200 of 4452) up to 2024-09-27 05:32:35.791 [DBG] UNIV: UniverseRoot(issuance-87225111b8571dcd09e00116238f61c71091485538801b2d6b6457a75119861e): Inserting 51 new leaves (4451 of 4452). And then continuing with a new root?: 2024-09-27 06:03:08.000 [DBG] UNIV: UniverseRoot(issuance-997e28940f643b41a3d937bcc842b01c3ba863d66dbd365db015d62a79dd32fc): Inserting 200 new leaves (200 of 3563).

Should I increase the DefaultNumTxRetries again?

jharveyb commented 1 month ago

Hmm, sounds like an issue with how we are passing work to the worker pool then, that single worker should end up receiving all the leaves anyways and not just batches of 200.

Will have to look into this deeper, I don't think NumRetries is the core issue.