ipfs / go-datastore

key-value datastore interfaces
MIT License
228 stars 64 forks source link

DiskUsage and approximate values #85

Open hsanjuan opened 6 years ago

hsanjuan commented 6 years ago

Per https://github.com/ipfs/go-ds-flatfs/pull/27, @kevina and others have raised concerns about not being possible to identify that the values returned by DiskUsage() might be approximate. It is also not possible to correct this programatically. Interesting bits:

I think it would be better to to keep track of if the disk usage is an estimate or an accurate count and inform the user of which

Then it that case I think a better solution is to return an undefined value for the DiskUsage possible informing the user on how to rectify it for large repos rather than return an estimate that can be 5-10% off without informing the user of the inaccuracy.

This information should at least be stored in the datastore somehow. Providing an interface to get to determine if the value stored is an approximation or an exact value can be a separate p.r.

Regarding re-calculating size:

Check from CheckedDatastore is meant to be triggered on ipfs repo fsck. I'm not sure where Scrub should get invoked, probably separate command like ipfs repo scrub.

ScrubbedDatastore.Scrub is probably where we want to wire this into. (Check / repo fsck should be fast).

This issue is to discuss the best approach to exposing that are innacurate.

Note that badger only updates the Size() once a minute. flatfs may have estimated the size during first launch and this might be off from the real size on big repositories or very slow disk. It also doesn't account for the growth of the directory sizes as the datastore gets filled with data.