holepunchto / hyperdrive

Hyperdrive is a secure, real time distributed file system
Apache License 2.0
1.86k stars 135 forks source link

Fast lookup of folder sizes #273

Closed martinheidegger closed 1 year ago

martinheidegger commented 4 years ago

Even though looking up the proper size of a directory seems broken at the moment (#272), it should be possible and as it should be possible. One way of how I could see this work is by storing the size of all parent nodes inside the stats of a new add/delete file entry:

e.g. you add, foo/bar/baz with a size of 5 to an empty store:

a = { parentSize: [5, 5, 5] }

then you add foo/boz with a size of 7, it reads the last entry of foo which is above entry and adds to the hierarchy its shared the size:

b = { parentSize: [12, 12] }

when we look now for the last item in foo/bar it will return the first item with size 5 and on foo it should return 12. next: when the file foo/bar/buz with size 9 is created it will look into the last file of each parent folder foo and foo/bar and get the size for each, store it each as

c = { parentSize: b.parentSize.map(size => size += 9).concat(anyOfA.map(size => size += 9) }

or

c = { parentSize: [21, 21, 14] }

And with the deletion of the first foo/bar/baz, a new "latest" entry is written, containing the new parent Sizes:

c = { parentSize: b.parentSize.map(size => size -= 5).concat(anyOfA.map(size => size -= 5) }

or

c = { parentSize: [16, 16, 9] }

an optimization step could be added that first looks at the last addition to the longest path foo/bar and compare it to the last entry of the root /, as soon as those happen to be the same, it stops iterating and uses that as base.

With the parentSize added to the folders we could simply list the sizes of folders, no many how many subdirectories there are, just by looking at the last entry.