Open MaxMaximus opened 4 weeks ago
Update. Strangely, I tried to reproduce the problem according to my own sequence from the 1st post. And on the 2nd (as well as the 3rd) restart of the node, GC successfully continued working from the saved position!
I haven't changed anything in the config or setup between them. The only three things that come to mind so far of how the first restart differed from the subsequent ones: 1 - node worked for a long time without restarts at the time of the first restart (more than a week, if necessary, I can later find the exact time of operation before restarting by scanning the log), and the 2nd and 3rd restarts were only after a few dozen minutes of work 2 - GC progress, judging by the number of sub-directories already created in \trash\ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxv5qaaaaaa\2024-07-06\ was almost finished (there were 845 of 1024 prefixes done) while at the 2nd and 3rd restarts it managed to process only a few prefixes. 3- Also, the difference will be that due to the reset of progress on the 2nd and 3rd passes GC did NOT move any files from \blobs\ to \trash\ because all files matching Bloom Filter were already moved during the first pass.
P.S. This is not the first time I notice that GC loses progress when restarting a node. Just only the first time I decided to document it. And the first time I saw that saving progress worked after all.
UPDATE2 I wanted to catch a moment at the end of the GC run and restart node there to check if that was the case. But I missed the right moment - the re-run of GC was completed significantly faster than I expected. And below it will be clear why it completed the work so "quickly" (in ~1.5 hours). But still, I managed to catch something very interesting: in addition to losing progress on first restart, the lazy GC also incorrectly transmitted information to the parent process about the amount of garbage collected:
2024-07-07T09:44:17+03:00 INFO lazyfilewalker.gc-filewalker.subprocess gc-filewalker completed {"satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Process": "storagenode", "piecesCount": 7408653, "trashPiecesCount": 0, "piecesTrashed": 0, "piecesSkippedCount": 0}
2024-07-07T09:44:17+03:00 INFO lazyfilewalker.gc-filewalker subprocess finished successfully {"satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2024-07-07T09:44:17+03:00 INFO retain Moved pieces to trash during retain {"cachePath": "D:\\Storj_Data\\Storage Node/retain", "Deleted pieces": 0, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 7408653, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Duration": "1h32m50.0489607s", "Retain Status": "enabled"}
So it report ZERO deleted/trashed pieces count. While the actual count is 1 684 268 pieces / 270 465 252 064 bytes were moved from \blobs\ to \trash\ I count this by size of \trash\ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa\2024-07-06\ folder. Wow, it's pretty extreme! That's more than 20% of all files stored on the node for this satellite were deleted at one pass of GC.
And it also give hint to source of the problem with wrong stat - because \2024-07-06\ folder contain only 845 sub-folders/prefixes. So the GC actually collected ZERO in the second pass. All these ~1.7M pieces were collected in the first pass. Which he just "forgot" about. I think this is one of the important reasons for the continuing large discrepancies in the statistics of disk space usage. Which a lot of SNOs continue to complain about. Such or similar losses of stats during garbage collection or subsequent trash deletion could easily be the reason.
So we have a whole big package of problems here at once due to the incorrect (in some cases, it is not yet clear which) operation of the save-state-resume function of storj filewalkers.
This issue has been mentioned on Storj Community Forum (official). There might be relevant details there:
https://forum.storj.io/t/two-weeks-working-for-free-in-the-waste-storage-business/26854/61
Hello.
Very useful feature to save-state-resume feature for GC filewalker is marked as already done some time ago: https://github.com/storj/storj/issues/6708 https://review.dev.storj.io/plugins/gitiles/storj/storj/+/0f90f061b028a9c877dbed3c01d8c3d95e4bc518
And it listed as merged and in production since storagenode v1.102: https://github.com/storj/storj/commit/0f90f06
But tests on my nodes (v 1.104.5 and 1.105.4 running as windows service) show what this feature is not working properly. GC (running in "lazy mode" as separate process) DO save current progress to db (
garbage_collection_filewalker_progress.db
). But it does not use this information after node restart and begins GC process from scratch instead.Steps to reproduce the issue:
I taken node currently in the process of GC for satelliteID 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S
I checked which prefix it was checking at the moment (by monitoring the disk I/O of GC process). It was
\blobs\ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa\2m\
I also openedgarbage_collection_filewalker_progress.db
(I use sqlite3 tool to view it):Looks like progress is saved. But here's an important note: I noticed that lazy GC writes the prefixs to this database immediately when it STARTS working on it. Whereas the name of the field in db table
last_checked_prefix
(as well as the description of the changes on the github) assume that the last prefix already processed should be stored in the database. Either this is an incorrect name and description of the field. Or could be another bug in the code that records the current (just started) prefix instead of the previous (last completed) one.I stopped the node correctly:
sc stop storagenode
and waited while all storj processes finished and exitedI started node again:
sc start storagenode
Checked which prefix GC is processing now. It was
\blobs\ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa\aa\
That is, it started its work from the beginning, the prefix stored in the database (\2m), which was last processing during the previous launch, was ignored.After some time progress in
garbage_collection_filewalker_progress.db
was also reset.Logs: storagenode.log around restart
I also recoeded full trace of disk IO of GC processed (two PIDs - before and after restart) by using Process Monitor. Attach it as zipped .csv as it large (log of ~5k IOP)
Garbage_Collector_IO.zip