nickna / Neighborly

An open-source vector database
MIT License
9 stars 2 forks source link

Implement defragmentation of `MemoryMappedList` #51

Closed hangy closed 1 week ago

hangy commented 2 weeks ago

Description:

For performance reasons, vectors are only marked as removed with a tombstone Guid when they are removed with MemoryMappedList.Remove(vector). The aim is to implement a method that removes all tombstoned Guids from the index as well as the associated data. The method should reorganize the index file and the data file, so that there are no empty blocks between actual records.

Tasks:

Resources:

Impact:

The memory mapped file will not grow indefinitely any more, as vectors are removed or updated.

How to Contribute:

  1. Fork the repository and create a new branch for your changes.
  2. Implement the defrag method
  3. Ensure all new and existing tests pass.
  4. Submit a pull request with a detailed description of the changes.
nickna commented 1 week ago

Merged branch into master --> feat: Defrag() and DefragBatch(), along with tests.

Defrag() does a whole-file defragmentation. DefragBatch() performs defragmentation in batches of 100 vectors.

Empty space is not reclaimed because I didn't want to shrink the file when deletions occur. I may need to revisit this after testing on different OSes.