project-machine / puzzlefs

A next-generation container filesystem
Apache License 2.0
393 stars 17 forks source link

better streaming implementation of fastcdc #15

Closed tych0 closed 1 year ago

tych0 commented 3 years ago

Right now the fastcdc implementation leaves a little to be desired. We really want callbacks when a chunk has been created, so we can purge that buffer and write the chunk out. Since we don't have those, we end up storing all the chunks in memory until we hit a file boundary. This means that if e.g. a file is 5GB, we'll allocate 5GB of memory to chunk it when we really don't need to.

However, the max size we'll ever allocate is the size of the largest file with the current design, so maybe it's OK for now. For "normal" sized files, the max allocation is the size of the largest possible chunk.

ariel-miculas commented 1 year ago

version 3.0 of fastcdc supports streaming: https://docs.rs/fastcdc/latest/fastcdc/v2020/struct.StreamCDC.html