pmem / pmdk

Persistent Memory Development Kit
https://pmem.io
Other
1.34k stars 510 forks source link

FEAT: Run tracking improvements #4183

Closed pbalcer closed 1 year ago

pbalcer commented 6 years ago

Run tracking improvements

Rationale

Current worst-case performance of run recycling is O(n), because we have to effectively lazily walk over all runs in the heap, looking for free space. This works fine in practice, where the recalculation of runs is amortized over many allocations, but the run recycling is currently the primary scalability bottleneck (recalculation happens under a lock), and improving it might yield significant performance gains in systems with a large number of threads.

Description

The idea is to move all runtime tracking information in a large transient sparse array, and accurately keep track of free space in a run by doing the calculation at the time of a dealloc instead of alloc. The recycler still needs to be able to give out chunks in a lowest-first best-fit fashion, so the tree must remain, but it will be updated immediately after the calculation, instead of being updated on demand.

API Changes

No changes to any APIs

Implementation details

pbalcer commented 5 years ago

This was done halfway in 1.7, and I still have some work on branches that I want to push out. Hopefully in 1.8 time-frame.

seghcder commented 3 years ago

Just checking in re this topic, and whether this would allow some level of runtime free space indication through an application-level API? Or, is there a plan to make pmempool usable at runtime via an API for example? I'm aware there are issues with reporting overall free space due to fragmentation, however at present we don't easily know (?) whether the pool is 10% or 90% allocated after starting the app, without I assume manually iterating through a low-level API.

This seems partly related to https://github.com/pmem/pmdk/issues/4186 re more info for space/fragmentation and https://github.com/pmem/pmdk/issues/4116 re pmempool, however pmempool still requires the pool offline to get stats**. Alternatively, could pmempool run in a read-only mode while the pool is online, and allow export of data in json format for example?

** I seem to be able to run pmempool on an open pool file in Windows but not in Linux (resource temporarily unavailable)

pbalcer commented 3 years ago

No, this is an internal allocator optimization that's trying to decouple metadata and data of persistent allocations.

The fact that you can run pmempool info with an open pool on Windows is an inadequacy of our file locking code for that platform :)

janekmi commented 1 year ago

If you consider this question still important to you please reopen the issue and provide more context for your request so we can reassess its priority.