Open Xib1uvXi opened 1 month ago
Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review. In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment. Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:
Finally, remember to use https://discuss.ipfs.io if you just need general support.
Will using a pool result in abnormal memory consumption when the file size is much smaller than the chunk size?
The pool tries to allocate buffers sized to the power of 2 closest to the chunk size. A small file causes a new allocation, and the pool buffer is returned to the pool.
small := make([]byte, n)
copy(small, full)
pool.Put(full)
Returning the buffer to the pool does not necessarily free it and may keep it in the pool. A large number of files will cause these allocations for each when reading the last partial chunk. So I not think it has to do with small files, but the number of files.
It may be better to keep the pool-allocated buffer, and avoid the extra allocation and copy:
small := full[:n]
or use the pool to allocate the smaller buffer:
small := pool.Get(n)
copy(small, full)
pool.Put(full)
Having fewer different buffer sizes might help with GC. Will test to see if this makes any difference.
accidental close - repoened
I have reimplemented a Splitter and modified some kubo related code. Under scenarios with a large number of files, memory usage has stabilized and no abnormal growth has been observed.
chunk
func NewAutoSplitter(r io.Reader, size int64, fileSize int64) chunk.Splitter {
ss := &AutoSplitter{
r: r,
size: size,
usePool: true,
fileSize: fileSize,
}
if fileSize >= size {
ss.mode = normal
}
if fileSize < fileSize {
ss.mode = small
ss.buffer = make([]byte, fileSize)
}
return ss
}
kubo, core/coreapi/unixfs.go
// Constructs a node from reader's data, and adds it. Doesn't pin.
func (adder *Adder) add(reader io.Reader, fileSize int64) (ipld.Node, error) {
chnk := chunker.NewAutoSplitterV2(reader, chunkSize, fileSize)
params := ihelper.DagBuilderParams{
Dagserv: adder.bufferedDS,
RawLeaves: adder.RawLeaves,
Maxlinks: ihelper.DefaultLinksPerBlock,
NoCopy: adder.NoCopy,
CidBuilder: adder.CidBuilder,
}
db, err := params.New(chnk)
if err != nil {
return nil, err
}
var nd ipld.Node
if adder.Trickle {
nd, err = trickle.Layout(db)
} else {
nd, err = balanced.Layout(db)
}
if err != nil {
return nil, err
}
return nd, adder.bufferedDS.Commit()
}
@Xib1uvXi I would be very interested to see what this looks like for you after running a while: https://github.com/ipfs/boxo/pull/649
Otherwise, avoiding using the pool in any case may end up being more efficient. I think it may be the case for many files, whether large or small since the extra allocation is caused by a partially filled chunk at the end of the file.
That PR includes a benchmark to compare a chunker that uses go-beffer-pool pool against one that does not. The benchmark simulates your use case of 10000 files with sizes varying between 20K and 60K bytes.
When I add a folder with many small files to kubo, memory usage always spikes abnormally
I located the exception by analyzing the heap
Q1: Will using a pool result in abnormal memory consumption when the file size is much smaller than the chunk size?