spacemeshos / post

Spacemesh POST protocol implementation
MIT License
20 stars 21 forks source link

Don't delete redundant postdata files #239

Closed lrettig closed 8 months ago

lrettig commented 1 year ago

Closes #236

Don't remove "redundant" files. It's too easy to accidentally remove already-initialized postdata files, and I don't see any value in doing so.

Remove the debug output for unrecognized files too (https://github.com/spacemeshos/go-spacemesh/issues/4789).

PengIoT commented 1 year ago

I lost a terabyte of data, and it took a graphics card many days to generate it. I put read-only attributes on the data, and it was still deleted, how much do the developers hate the data?

fasmat commented 1 year ago

With that change we should also update the Initializer::verifyMetadata method to return an error if NumUnits doesn't match the expected value (instead of only if it is larger than the expected value), update tests accordingly and probably add a test case for the changed behavior.

Otherwise larger PoSTs than expected aren't handled correctly, i.e. the node thinks that it has x numUnits but generates proofs for x+n numUnits that will fail to verify after generating.

fasmat commented 1 year ago

TestInitialize_RedundantFiles also needs to be updated: you can just add a t.Skip similar to TestInitialize_NumUnits_Increase with the message that resizing PoST isn't supported at the moment.

lrettig commented 1 year ago

@fasmat could you please explain this part?

the node thinks that it has x numUnits but generates proofs for x+n numUnits that will fail to verify after generating.

fasmat commented 1 year ago

Imagine the following situation:

  1. The node calls proving.Generate on a directory containing 16 space units (1 TB) of data.
  2. After ~ 3 hours the proof is finished generating
  3. The node configured for 10 space units verifies the proof.
  4. Proof verification fails, the node waits 5 minutes and starts generating a new proof (that will again fail to verify).

Before such a setup would have silently deleted PoST data, now it will result in an infinite loop.

Preventing the node from deleting data solves one issue but introduces others. Right now the change in this PR also makes it impossible for a user to willingly reduce their PoST size, a use case that is currently covered from end to end:

  1. user changes their post size (via postcli or node config)
  2. smaller post size will cause the node to publish an ATX in the next epoch with the new correct size.
lrettig commented 1 year ago

Thanks, that helps.

The node calls proving.Generate on a directory containing 16 space units (1 TB) of data.

Why isn't the node passing the numunits into proving.Generate? And/or why isn't numunits being read from the metadata? In other words, the number of files on disk should never be taken as an authoritative source of information.

Proof verification fails

Why would a proof of a greater number of units fail a check for a lesser numunits? Can't we say that a proof of numunits >= n is always valid for numunits=n?

this PR also makes it impossible for a user to willingly reduce their PoST size

In this case the user is responsible for manually deleting the extra files. We should print a warning if we find them, but otherwise everything should work fine.

fasmat commented 1 year ago

And/or why isn't numunits being read from the metadata?

It is, proof generation reads the value from postdata_metadata.json. The node however reads it from the node configuration. Afaik these values are only checked to match in the initializer and the node config atm has precedence (as do parameters passed to postcli).

In other words, the number of files on disk should never be taken as an authoritative source of information.

That's why if they don't match we delete files 🙂, the authoritative source is the node config or the parameters passed to postcli. If a wrong value is passed files will be deleted.

Can't we say that a proof of numunits >= n is always valid for numunits=n?

We could, I can't think of a reason why this shouldn't be allowed - I would need to think on that. However such a proof isn't considered valid at the moment, so additional changes are necessary.

In this case the user is responsible for manually deleting the extra files.

Then this should be made explicit and documented in the README.md and manual. Right now if the user is manually deleting files they will just be regenerated, also after this PR is merged.

fasmat commented 1 year ago

Regarding allowing proofs for bigger PoST sizes: if we allow that we essentially have to finish implementing this epic: https://github.com/spacemeshos/pm/issues/209

If we don't it would be unfair that users only receive rewards for x Space Units if they were successfully able to publish a proof for x+n Space Units.

codecov[bot] commented 1 year ago

Codecov Report

Merging #239 (2517665) into develop (6fc593a) will increase coverage by 0.3%. The diff coverage is 100.0%.

@@            Coverage Diff            @@
##           develop    #239     +/-   ##
=========================================
+ Coverage     68.6%   69.0%   +0.3%     
=========================================
  Files           28      28             
  Lines         1862    1840     -22     
=========================================
- Hits          1279    1270      -9     
+ Misses         435     426      -9     
+ Partials       148     144      -4     
Files Changed Coverage Δ
initialization/initialization.go 77.7% <100.0%> (+1.6%) :arrow_up:
lrettig commented 1 year ago

@fasmat I made the requested changes and updated the tests, please take another look.

If we don't it would be unfair that users only receive rewards for x Space Units if they were successfully able to publish a proof for x+n Space Units.

I think it's fine if a proof of x+n SU gets created and the user gets credit for it, even if NumUnits says something smaller. The main issue that this PR is trying to address is that the system should never automatically delete files regardless of what the metadata says.

fasmat commented 1 year ago

I think it's fine if a proof of x+n SU gets created and the user gets credit for it

The user only gets credit for x SUs then but not x+n.

I made the requested changes and updated the tests, please take another look.

Could you add a test that initialises with x SUs, then initialises again with x-n space units, then generates a proof and then tries to verify the proof with x-n space units? I would assume such a test would currently fail in post-rs

fasmat commented 8 months ago

With https://github.com/spacemeshos/post/pull/270 users now have to confirm if the provide flags to postcli that will cause the tool to delete existing post data. Additionally I'm currently working on removing initialization from the node itself -smapp and CLI users both will have to use postcli in the future.