polkascan / polkascan-os

Polkascan Open Source
https://polkascan.io
GNU General Public License v3.0
51 stars 54 forks source link

Harvester raises an error while processing blocks on Polkadot CC1 #21

Open togran opened 4 years ago

togran commented 4 years ago

Occasionally harvester_worker raises an error and refuses to process blocks anymore

harvester-worker_1   | Traceback (most recent call last):
harvester-worker_1   |   File "/usr/local/lib/python3.6/site-packages/scalecodec/types.py", line 1043, in process
harvester-worker_1   |     return self.value_list[self.index]
harvester-worker_1   | IndexError: list index out of range
harvester-worker_1   | 
harvester-worker_1   | During handling of the above exception, another exception occurred:
harvester-worker_1   | 
harvester-worker_1   | Traceback (most recent call last):
harvester-worker_1   |   File "/usr/local/lib/python3.6/site-packages/celery/app/trace.py", line 375, in trace_task
harvester-worker_1   |     R = retval = fun(*args, **kwargs)
harvester-worker_1   |   File "/usr/src/app/app/tasks.py", line 70, in __call__
harvester-worker_1   |     return super().__call__(*args, **kwargs)
harvester-worker_1   |   File "/usr/local/lib/python3.6/site-packages/celery/app/trace.py", line 632, in __protected_call__
harvester-worker_1   |     return self.run(*args, **kwargs)
harvester-worker_1   |   File "/usr/src/app/app/tasks.py", line 162, in start_sequencer
harvester-worker_1   |     result = harvester.start_sequencer()
harvester-worker_1   |   File "/usr/src/app/app/processors/converters.py", line 843, in start_sequencer
harvester-worker_1   |     integrity_status = self.integrity_checks()
harvester-worker_1   |   File "/usr/src/app/app/processors/converters.py", line 815, in integrity_checks
harvester-worker_1   |     self.add_block(substrate.get_block_hash(block.id))
harvester-worker_1   |   File "/usr/src/app/app/processors/converters.py", line 507, in add_block
harvester-worker_1   |     events_decoder = self.substrate.get_block_events(block_hash, self.metadata_store[parent_spec_version])
harvester-worker_1   |   File "/usr/local/lib/python3.6/site-packages/substrateinterface/__init__.py", line 543, in get_block_events
harvester-worker_1   |     events_decoder.decode()
harvester-worker_1   |   File "/usr/local/lib/python3.6/site-packages/scalecodec/base.py", line 278, in decode
harvester-worker_1   |     self.value = self.process()
harvester-worker_1   |   File "/usr/local/lib/python3.6/site-packages/scalecodec/block.py", line 313, in process
harvester-worker_1   |     element = self.process_type('EventRecord', metadata=self.metadata)
harvester-worker_1   |   File "/usr/local/lib/python3.6/site-packages/scalecodec/base.py", line 359, in process_type
harvester-worker_1   |     obj.decode(check_remaining=False)
harvester-worker_1   |   File "/usr/local/lib/python3.6/site-packages/scalecodec/base.py", line 278, in decode
harvester-worker_1   |     self.value = self.process()
harvester-worker_1   |   File "/usr/local/lib/python3.6/site-packages/scalecodec/block.py", line 354, in process
harvester-worker_1   |     arg_type_obj = self.process_type(arg_type)
harvester-worker_1   |   File "/usr/local/lib/python3.6/site-packages/scalecodec/base.py", line 359, in process_type
harvester-worker_1   |     obj.decode(check_remaining=False)
harvester-worker_1   |   File "/usr/local/lib/python3.6/site-packages/scalecodec/base.py", line 278, in decode
harvester-worker_1   |     self.value = self.process()
harvester-worker_1   |   File "/usr/local/lib/python3.6/site-packages/scalecodec/types.py", line 566, in process
harvester-worker_1   |     result[key] = self.process_type(data_type, metadata=self.metadata).value
harvester-worker_1   |   File "/usr/local/lib/python3.6/site-packages/scalecodec/base.py", line 359, in process_type
harvester-worker_1   |     obj.decode(check_remaining=False)
harvester-worker_1   |   File "/usr/local/lib/python3.6/site-packages/scalecodec/base.py", line 278, in decode
harvester-worker_1   |     self.value = self.process()
harvester-worker_1   |   File "/usr/local/lib/python3.6/site-packages/scalecodec/types.py", line 1045, in process
harvester-worker_1   |     raise ValueError("Index '{}' not present in Enum value list".format(self.index))
harvester-worker_1   | ValueError: Index '202' not present in Enum value list

Might be related to Block #125428 but doesn't seem likely - it's always different blocks when restarted.

Also, not sure if related or not, aggregate tables (data_block_total, data_account etc) refuse to be filled, and it reflects on the GUI image

arjanz commented 4 years ago

We've just added the Polkadot CC1 configuration files that reflects the settings on Polkascan.io, which I'm running at the moment for a testrun. You could pull the changes from the repo and test it out.

It could also be the case you need to force an type-registry update, the easiest to do this is to rebuild the Docker images by: docker-compose -p polkadot-cc1 -f docker-compose.polkadot-cc1-full.yml build --no-cache

togran commented 4 years ago

Unfortunately, pulling fresh version and rebuilding containers doesn't solve the issue

vshvsh commented 4 years ago

Here's what happens (tried it a few times today to the same result):

  1. full images rebuild --no-cache and volume cleaning, starting from scratch using your docker-compose files with no modifications
  2. start mysqld as per instructions, wait a couple of minutes
  3. start everything
  4. polkadot blocks start getting harvested with holes in the chain
  5. app.tasks.start_sequencer jobs fail (though in Flower they are shown as successful): first due to 'chain not at genesis' which I don't understand but ok, then, understandably, because there are gaps in blocks and integrity check fails (Block #3457 is missing.. stopping check), block number is random from run to run.
  6. eventually all blocks are collected and sequencer starts working
  7. after a minute or so there's another gap in new blocks - always two blocks exactly but always different blocks, usually quite ordinary empty ones. In my last run those were blocks 146002 and 146003.
  8. in harvester logs there's an error like in the PR top post:
    File "/usr/local/lib/python3.6/site-packages/scalecodec/types.py", line 1045, in process
    raise ValueError("Index '{}' not present in Enum value list".format(self.index))
    ValueError: Index '202' not present in Enum value list
  9. app.tasks.start_sequencer fails again with Block #146002 is missing.. stopping check
  10. Further blocks continue to be collected

Hope that helps to identify the problem because I'm at loss for now.

vshvsh commented 4 years ago

My best guess is that https://github.com/polkascan/polkascan-pre-harvester/blob/40003e2d3d2c5ca45b3e015e097851a53590da63/app/processors/converters.py#L811 remove_block happens, but

https://github.com/polkascan/polkascan-pre-harvester/blob/40003e2d3d2c5ca45b3e015e097851a53590da63/app/processors/converters.py#L815 add block doesn't for some reason

vshvsh commented 4 years ago

Blocks that can't be added by integrity checks are processed fine when calling harvester api's process endpoint later.

arjanz commented 4 years ago

I suspect there are indeed some issues with processing the reorgs, as a workaround I modified the docker-compose configs in master with FINALIZATION_ONLY=1, so only finalized blocks will be processed.

This is actually the setting we also use at polkascan.io, so I'm pretty confident this should work, but let me know if these issues remain..

vshvsh commented 4 years ago

I'm pretty sure this will fix the problem, reorgs were obviously the culprits. Thanks!

vshvsh commented 4 years ago

With FINALIZATION_ONLY=1 it works fine, but I think the issue should stay open because without that option polkascan still fails to sync

arjanz commented 4 years ago

Is this still the case with the latest version? Something you could also try is to monitor http://127.0.0.1:8080/polkadot-cc1/harvester/admin if blocks are being processed and try to restart by pressing "Process blocks in harvester queue" if process is interupted.