lbryio / hub

MIT License
16 stars 15 forks source link

scribe "error while processing txs: int too large to convert" at mainnet block 1176266 #124

Open m0n5t3r opened 1 year ago

m0n5t3r commented 1 year ago

Using the latest published docker image for lbry/hub:master (lbry/hub@sha256:7ad1a2f570f061c9a8e2db65a9fe18f978c12df913f66ff93b3a49d87e674f6e) the mainnet sync stops with this backtrace:

scribe_1               | 2023-03-26 07:12:43,994 INFO hub.service.BlockchainProcessorService:81: software version: 1.0.0
scribe_1               | 2023-03-26 07:12:43,994 INFO hub.service.BlockchainProcessorService:83: reorg limit is 200 blocks
scribe_1               | 2023-03-26 07:12:45,936 INFO hub.db.db:1036: opened db for writing: lbry-rocksdb (/database/lbry-rocksdb)
scribe_1               | 2023-03-26 07:12:45,937 INFO hub.db.db:1046: DB version: 12
scribe_1               | 2023-03-26 07:12:45,937 INFO hub.db.db:1047: coin: LBRY
scribe_1               | 2023-03-26 07:12:45,937 INFO hub.db.db:1048: network: mainnet
scribe_1               | 2023-03-26 07:12:45,937 INFO hub.db.db:1049: height: 1,176,266
scribe_1               | 2023-03-26 07:12:45,937 INFO hub.db.db:1050: tip: e5dbd99e5b626eac6e1f64cfc98c3030a9634b1a08c53e250ca33599d57ca015
scribe_1               | 2023-03-26 07:12:45,937 INFO hub.db.db:1051: tx count: 80,612,560
scribe_1               | 2023-03-26 07:12:45,937 INFO hub.db.db:1053: last indexed address statuses at block 1176266
scribe_1               | 2023-03-26 07:12:45,937 INFO hub.db.db:1054: using address status index: True
scribe_1               | 2023-03-26 07:12:45,937 INFO hub.db.db:1064: flush count: 1,176,267
scribe_1               | 2023-03-26 07:12:45,937 INFO hub.service.BlockchainProcessorService:86: initializing caches
scribe_1               | 2023-03-26 07:12:48,397 INFO hub.db.db:1168: populating header merkle cache...
scribe_1               | 2023-03-26 07:12:50,401 INFO hub.db.db:1173: header merkle cache populated in 2.0s
scribe_1               | 2023-03-26 07:12:50,401 INFO hub.service.BlockchainProcessorService:89: opened db at block 1176266
scribe_1               | 2023-03-26 07:12:50,420 INFO hub.scribe.prefetcher:82: catching up to daemon height 1,331,765 (155,499 blocks behind)
scribe_1               | 2023-03-26 07:12:53,447 ERROR hub.service.BlockchainProcessorService:255: advance blocks failed
scribe_1               | Traceback (most recent call last):
scribe_1               |   File "/home/lbry/hub/scribe/service.py", line 242, in check_and_advance_blocks
scribe_1               |     txi_count, txo_count, c_added, c_spent, s_added, s_spent, abandoned, abandoned_chan = await self.run_in_thread_with_lock(
scribe_1               |   File "/home/lbry/hub/scribe/service.py", line 157, in run_in_thread_with_lock
scribe_1               |     return await asyncio.shield(run_in_thread_locked())
scribe_1               |   File "/home/lbry/hub/scribe/service.py", line 156, in run_in_thread_locked
scribe_1               |     return await asyncio.get_event_loop().run_in_executor(self._executor, func, *args)
scribe_1               |   File "/usr/lib/python3.9/concurrent/futures/thread.py", line 52, in run
scribe_1               |     result = self.fn(*self.args, **self.kwargs)
scribe_1               |   File "/home/lbry/hub/scribe/service.py", line 1726, in advance_block
scribe_1               |     self._get_cumulative_update_ops(height)
scribe_1               |   File "/home/lbry/hub/scribe/service.py", line 1588, in _get_cumulative_update_ops
scribe_1               |     self.db.prefix_db.future_effective_amount.stash_multi_put([
scribe_1               |   File "/home/lbry/hub/db/interface.py", line 128, in stash_multi_put
scribe_1               |     self._op_stack.stash_ops([RevertablePut(self.pack_key(*k), self.pack_value(*v)) for k, v in items])
scribe_1               |   File "/home/lbry/hub/db/interface.py", line 128, in <listcomp>
scribe_1               |     self._op_stack.stash_ops([RevertablePut(self.pack_key(*k), self.pack_value(*v)) for k, v in items])
scribe_1               |   File "/home/lbry/hub/db/prefixes.py", line 1844, in pack_value
scribe_1               |     return super().pack_value(future_effective_amount)
scribe_1               |   File "/home/lbry/hub/db/interface.py", line 160, in pack_value
scribe_1               |     return cls.value_struct.pack(*args)
scribe_1               | struct.error: int too large to convert
scribe_1               | 2023-03-26 07:12:53,449 ERROR hub.service.BlockchainProcessorService:2106: error while processing txs: int too large to convert
scribe_1               | Traceback (most recent call last):
scribe_1               |   File "/home/lbry/hub/scribe/service.py", line 2102, in process_blocks_and_mempool_forever
scribe_1               |     await self.check_and_advance_blocks(blocks)
scribe_1               |   File "/home/lbry/hub/scribe/service.py", line 242, in check_and_advance_blocks
scribe_1               |     txi_count, txo_count, c_added, c_spent, s_added, s_spent, abandoned, abandoned_chan = await self.run_in_thread_with_lock(
scribe_1               |   File "/home/lbry/hub/scribe/service.py", line 157, in run_in_thread_with_lock
scribe_1               |     return await asyncio.shield(run_in_thread_locked())
scribe_1               |   File "/home/lbry/hub/scribe/service.py", line 156, in run_in_thread_locked
scribe_1               |     return await asyncio.get_event_loop().run_in_executor(self._executor, func, *args)
scribe_1               |   File "/usr/lib/python3.9/concurrent/futures/thread.py", line 52, in run
scribe_1               |     result = self.fn(*self.args, **self.kwargs)
scribe_1               |   File "/home/lbry/hub/scribe/service.py", line 1726, in advance_block
scribe_1               |     self._get_cumulative_update_ops(height)
scribe_1               |   File "/home/lbry/hub/scribe/service.py", line 1588, in _get_cumulative_update_ops
scribe_1               |     self.db.prefix_db.future_effective_amount.stash_multi_put([
scribe_1               |   File "/home/lbry/hub/db/interface.py", line 128, in stash_multi_put
scribe_1               |     self._op_stack.stash_ops([RevertablePut(self.pack_key(*k), self.pack_value(*v)) for k, v in items])
scribe_1               |   File "/home/lbry/hub/db/interface.py", line 128, in <listcomp>
scribe_1               |     self._op_stack.stash_ops([RevertablePut(self.pack_key(*k), self.pack_value(*v)) for k, v in items])
scribe_1               |   File "/home/lbry/hub/db/prefixes.py", line 1844, in pack_value
scribe_1               |     return super().pack_value(future_effective_amount)
scribe_1               |   File "/home/lbry/hub/db/interface.py", line 160, in pack_value
scribe_1               |     return cls.value_struct.pack(*args)
scribe_1               | struct.error: int too large to convert
scribe_1               | 2023-03-26 07:12:53,450 ERROR hub.service.BlockchainProcessorService:2114: fatal error in block processor loop: int too large to convert
scribe_1               | Traceback (most recent call last):
scribe_1               |   File "/home/lbry/hub/scribe/service.py", line 2107, in process_blocks_and_mempool_forever
scribe_1               |     raise err
scribe_1               |   File "/home/lbry/hub/scribe/service.py", line 2102, in process_blocks_and_mempool_forever
scribe_1               |     await self.check_and_advance_blocks(blocks)
scribe_1               |   File "/home/lbry/hub/scribe/service.py", line 242, in check_and_advance_blocks
scribe_1               |     txi_count, txo_count, c_added, c_spent, s_added, s_spent, abandoned, abandoned_chan = await self.run_in_thread_with_lock(
scribe_1               |   File "/home/lbry/hub/scribe/service.py", line 157, in run_in_thread_with_lock
scribe_1               |     return await asyncio.shield(run_in_thread_locked())
scribe_1               |   File "/home/lbry/hub/scribe/service.py", line 156, in run_in_thread_locked
scribe_1               |     return await asyncio.get_event_loop().run_in_executor(self._executor, func, *args)
scribe_1               |   File "/usr/lib/python3.9/concurrent/futures/thread.py", line 52, in run
scribe_1               |     result = self.fn(*self.args, **self.kwargs)
scribe_1               |   File "/home/lbry/hub/scribe/service.py", line 1726, in advance_block
scribe_1               |     self._get_cumulative_update_ops(height)
scribe_1               |   File "/home/lbry/hub/scribe/service.py", line 1588, in _get_cumulative_update_ops
scribe_1               |     self.db.prefix_db.future_effective_amount.stash_multi_put([
scribe_1               |   File "/home/lbry/hub/db/interface.py", line 128, in stash_multi_put
scribe_1               |     self._op_stack.stash_ops([RevertablePut(self.pack_key(*k), self.pack_value(*v)) for k, v in items])
scribe_1               |   File "/home/lbry/hub/db/interface.py", line 128, in <listcomp>
scribe_1               |     self._op_stack.stash_ops([RevertablePut(self.pack_key(*k), self.pack_value(*v)) for k, v in items])
scribe_1               |   File "/home/lbry/hub/db/prefixes.py", line 1844, in pack_value
scribe_1               |     return super().pack_value(future_effective_amount)
scribe_1               |   File "/home/lbry/hub/db/interface.py", line 160, in pack_value
scribe_1               |     return cls.value_struct.pack(*args)
scribe_1               | struct.error: int too large to convert

Please let me know what other info I can provide (I'm reasonably experienced with Python and docker, I could set up a local build to dig into it if needed).

m0n5t3r commented 1 year ago

is this project abandoned?

moodyjon commented 1 year ago

Any progress is on a volunteer unpaid basis at this point so much slower.

@jackrobison

I can't locate 7ad1a2f570f... in the history of the hub repo. Also, are you starting from a snapshot obtained from somewhere, or starting from zero blocks? It might be a flaw in the snapshot OR the code.

m0n5t3r commented 1 year ago
       "Id": "sha256:c4ebce1d42cc140cbe0ae17270fccda62b1758a8593b520cceed641f33fea217",
        "RepoTags": [
            "lbry/hub:master"
        ],
        "RepoDigests": [
            "lbry/hub@sha256:7ad1a2f570f061c9a8e2db65a9fe18f978c12df913f66ff93b3a49d87e674f6e"
        ],

I suppose it's the other one then? c4ebce1d42cc (I'm not well versed in repository internals, I presumed the repo digest is the one I should provide)

Is it worth setting up a local build and adding some prints / logging in there to see what exactly is too large?

moodyjon commented 1 year ago

Yes, you could gather logging about the claim hash involved and the other values. But be aware that debugging these things requires tracking the history of updates on the claim, perhaps going back 1000s of blocks. The first step is to identify the claim. Then add logging focusing on the history of that claim. The LBRY block explorer can also help you understand the history of the claim and locate the range of block numbers to focus on.

Another route would be to try to reproduce the problem starting from block 0 without any snapshot. This would be an attempt to eliminate the possibility that the snapshot is incompatible with the current code. This will take 2-3 weeks of runtime.

m0n5t3r commented 1 year ago

I had it print all things it's trying to put into future_effective_amount, and I have a strong suspicion that this is the culprit (the rest are all positive numbers):

scribe_1               | 2023-04-08 17:41:09,610 INFO hub.service.BlockchainProcessorService:1594:      12c3847a9155b524cd9cc199eb406e79e90b761a: -2463572

(the rest are all positive numbers); I don't know enough about the system to figure out if https://explorer.lbry.com/claims/12c3847a9155b524cd9cc199eb406e79e90b761a has anything wrong about it (but the rest of the publisher's stuff looks odd nevermind, I was confused by the explorer's presentation, they have readable titles and all)

m0n5t3r commented 1 year ago

as an update, that negative number is the result of a pre-existing amount and a negative delta that's larger than the amount; I took a snapshot (hopefully - did a cp -al on both es and rocksdb because I don't have space to waste on that machine) and forced the value to be positive in https://github.com/lbryio/hub/blob/master/hub/scribe/service.py#L1589 with max(<expression>,0)

this hub is for personal use and experimentation, I'll restart it without a snapshot after I manage to make it work like this and get lbry-desktop to talk to it

I'll report back if it does the same when started from scratch

moodyjon commented 1 year ago

Claim created height 1174719 with .01 amount attached: https://explorer.lbry.com/tx/a99e454a0ff41a62e89ece450d0aed4fd78d368a066423c1e4ba04e0039986d8#output-0

The original .01 amount spent at height 1174802: https://explorer.lbry.com/tx/bbabe06c83647746781b72a31de181dbba4d1c246f74245bf93bb71c5de6b5a0#input-127637079

I'm not sure what's special about height 1176266. But there could be support amounts being added/removed pertaining to the claim 12c3847a9155b524cd9cc199eb406e79e90b761a which lie in the range 1174719..1176266. It's just not obvious in the block explorer.

but the rest of the publisher's stuff looks odd

The channel @SkyDomeAtlantis has a lot of videos named according to a scheme "trim.UUID". So the LBRY URL on these will be something like: https://odysee.com/@SkyDomeAtlantis:4/trim.AB556F9F-D56F-41F7-9DE0-8C05839AC3CF:8. The videos have normal looking titles/descriptions but the claim name involving a UUID is a little odd. One can choose a meaningful claim name to appear in the URL (like the title), but in this case it's machine-generated.

The channel seems to frequently be shifting amounts between claims, and possibly revising/re-uploading their videos. Consider the name "trim.AB556F9F-D56F-41F7-9DE0-8C05839AC3CF:8". It looks like it starts with a UUID, yet the suffix ":8" mean that the base name "trim.AB556F9F-D56F-41F7-9DE0-8C05839AC3CF" has been seen 7 times before. So maybe this one was the 8th revision based off the same file with name "trim.AB556F9F-D56F-41F7-9DE0-8C05839AC3CF".

It's similar to scenarios where underflow problems were seen before -- frequent updates to claim amounts with multiple claims under the same base name. There's a feature of the LBRY blockchain called Activation Delay that complicates calculation of effective amount and future effective amount. The activation delay comes into play precisely in these competing claim scenarios. It's just that the channel owner seems to be competing with themselves for control over the name.

@jackrobison

m0n5t3r commented 1 year ago

as an update: after it finished syncing with my crude fix, it kept following the chain for a while, then scribe crashed on another block; I started scribe from scratch (and with the original code instead of my modifications), and it seems it's up to date now, so that snapshot was broken somehow

moodyjon commented 1 year ago

I started scribe from scratch (and with the original code instead of my modifications), and it seems it's up to date now, so that snapshot was broken somehow

Good to know... Do you know which snapshot you were starting from? The latest ones appear to be in subdirs named block_NNNNNNN. They were uploaded in Nov 2022, Jan 2023, and Feb 2023.

https://snapshots.lbry.com/hub/

Based on the fact that the crash was observed at block 1176266, you must have been using an earlier one.