nspcc-dev / neo-go

Go Node and SDK for the Neo blockchain
MIT License
123 stars 79 forks source link

`upload-bin` hangs on index files construction if at least one SEARCH result is incomplete #3647

Closed AnnaShaleva closed 2 weeks ago

AnnaShaleva commented 2 weeks ago

Current Behavior

Here are logs of upload-bin command for neogo-block-uploader-fschain-testnet.service instance:

2024-10-28 06:32:48.810 Successfully uploaded batch of blocks: from 4440000 to 4449999  
2024-10-28 06:41:36.505 Processing batch from 4460000 to 4461912    
2024-10-28 06:41:36.505 Successfully uploaded batch of blocks: from 4450000 to 4459999  
2024-10-28 06:43:28.713 Updating index files... 
2024-10-28 06:43:28.713 Successfully uploaded batch of blocks: from 4460000 to 4461912

Index file update started at 06:43:28.713 and already 3 hours passed since its start. No index files are uploaded yet, and it's a bug in the program.

The problem is in the waiting code:

https://github.com/nspcc-dev/neo-go/blob/b8a65d3c37cfa429b8a5e135422bb1fa78762056/cli/util/uploader.go#L416-L428

This code waits for exactly indexFileSize number of OIDs to be returned from SEARCH routines. However, due to #3645 a incomplete batch may be returned, and we don't have any check for this in the code: https://github.com/nspcc-dev/neo-go/blob/b8a65d3c37cfa429b8a5e135422bb1fa78762056/cli/util/uploader.go#L396-L400 https://github.com/nspcc-dev/neo-go/blob/b8a65d3c37cfa429b8a5e135422bb1fa78762056/cli/util/uploader.go#L410-L412

Expected Behavior

Index files must be uploaded without deadlocks and endless cycles.

Possible Solution

Add retry to SEARCH if the number of returned objects is less than expected indexFileSize. Retry until successful SEARCH attempt.

AnnaShaleva commented 2 weeks ago

Here are some more recent logs, we need this issue to be fixed:

2024-10-29 04:04:40.526 Successfully uploaded batch of blocks: from 6590000 to 6597865  
2024-10-29 04:33:47.787 failed to update index files after upload: empty oid found in index file 0 at position 16556 (block index 16556)    
2024-10-29 04:34:51.317 Chain block height: 6598013 
2024-10-29 04:38:49.004 Processing batch from 6590000 to 6598013    
2024-10-29 04:38:49.004 First block of latest incomplete batch uploaded to NeoFS container: 6590000 
2024-10-29 04:42:29.264 Updating index files... 
2024-10-29 04:42:29.264 Successfully uploaded batch of blocks: from 6590000 to 6598013  
2024-10-29 05:09:51.357 failed to update index files after upload: empty oid found in index file 0 at position 1976 (block index 1976)  
2024-10-29 05:10:52.966 Chain block height: 6598150 
2024-10-29 05:13:47.332 Processing batch from 6590000 to 6598150    
2024-10-29 05:13:47.332 First block of latest incomplete batch uploaded to NeoFS container: 6590000 
2024-10-29 05:17:27.529 Updating index files... 
2024-10-29 05:17:27.529 Successfully uploaded batch of blocks: from 6590000 to 6598150  
2024-10-29 05:45:49.080 failed to update index files after upload: empty oid found in index file 0 at position 4257 (block index 4257)  
2024-10-29 05:46:51.451 Chain block height: 6598287
2024-10-29 05:51:47.574 Processing batch from 6590000 to 6598287    
2024-10-29 05:51:47.574 First block of latest incomplete batch uploaded to NeoFS container: 6590000 
2024-10-29 05:56:13.871 Updating index files... 
2024-10-29 05:56:13.871 Successfully uploaded batch of blocks: from 6590000 to 6598287  
2024-10-29 06:28:38.586 failed to update index files after upload: empty oid found in index file 0 at position 91 (block index 91)
AnnaShaleva commented 2 weeks ago

BTW, these gaps in the resulting index file (empty oid found in index file 0) may still be connected not only with incomplete SEARCH result, but also with some duplicating blocks, similar to https://github.com/nspcc-dev/neo-go/issues/3630#issuecomment-2426476212. @AliceInHunterland, please also check if the uploaded sequence of blocks contains duplicating entries for all chains, at least within the bounds of the first index file.

AnnaShaleva commented 2 weeks ago

BTW, all these logs about "gaps" belong to a single neogo-block-uploader-fschain-mainnet.service instance, and the numbers 91, 1976, 16556, 4257 are repeated, here's a full log tail: Explore-logs-2024-10-29 08_26_04.txt

AnnaShaleva commented 2 weeks ago

To summarize, the reason is that there are duplicating block objects uploaded to the container (@AliceInHunterland you may attach the example). NeoFS session token and block signatures set affect the resulting object ID, hence within the scope of this issue we need to prepare index file constructor to the fact that there might be duplicating entries in the container.

AliceInHunterland commented 2 weeks ago

./bin/neo-go util upload-bin --cid VCVr6RKg1hfJR7v36xvjbaXTncj1waPi94u9EFyEttW --wallet-config ./wallet-config.yml --block-attribute Block --index-attribute Index --rpc-endpoint https://rpc1.morph.fs.neo.org:40341 --fs-rpc-endpoint st1.storage.fs.neo.org:8080 --timeout 3m --skip-blocks-uploading --index-file-size 10000

Warning: Missing block index 1889
Warning: Missing block index 1976
Warning: Multiple oids: [4vMdkXbQEgnSSiEi7xsywwoqpbdS6mgqPFSSouyqXAEj 5uiRgBiVmcPrRPjjvJ7tqj3q6pbEb5tP4yynMMbmHHXf] for block index 2171
Warning: Multiple oids: [HvAc5MmjmQk94uJq6yCC9uxKLd87D8iGHJqQb3dKhw99 32RL6F6zf7r7ueShgPFi4SHDAWjF1U7p4PdFKsfdR6Qk] for block index 3714
Warning: Multiple oids: [5YGcGQ3XQKKxDYDTc7vjxkpSAxZaFFreeGSEpcqG4aWF 5qqheyM2Uoisjfuof4P1JpkzAR7vh7ih4uQaw97cYUDN] for block index 4001
Warning: Missing block index 4257