nickna / Neighborly

An open-source vector database
MIT License
9 stars 2 forks source link

fix: MemoryMappedList for Windows #52

Closed nickna closed 2 weeks ago

nickna commented 2 weeks ago

I noticed a few issues when I ran the tests with MemoryMappedLists on Windows.

  1. File lock exceptions were thrown because _deleteHandle (FileStream) was setting a lock in the file before MemoryMappedFile.CreateFromFile was called.
    In MemoryMappedList.Dispose(), I use direct file deletion as an alternative approach: File.Delete(_fileName).

  2. Files remain if an exception is thrown during creation When an issue (e.g., disk space), the file may remain. I just added an exception catch in Reset(); it deletes the temporary file I also used Path.GetTempFileName() to create the files in the OS-designated temporary location.

  3. MemoryMappedFile allocation takes entire disk capacity on instantiation This appears to be due to a difference in file allocation between Win32 and Unix. Windows (NTFS) supports the concept of a Sparse files (as does Unix), and I initially assumed that a MemoryMappedFilewould be created that way, but it does not, which leads to an immediate disk I/O issue and failure in all tests. The workaround here is to pre-allocate the two files as Sparse on Windows using Win32 API calls. I need to test this on NTFS and ReFS, the new Windows file system.

I started putting changes here: https://github.com/nickna/Neighborly/tree/hotfix-memorymappedfiles CC: @hangy

hangy commented 2 weeks ago

Oh, that's an insane platform difference. 😥 I initially wanted to use the MemoryMappedFile without a filename for persistence, but that wasn't available on Linux (Ubuntu in WSL2 on Win 11 to be precise). Maybe CreateNew for a non-persisted memory-mapped file might be better for Windows?

nickna commented 2 weeks ago

Added function: _WinFileAlloc(). Tested on NTFS and Windows 11 x64. Tests pass when the file's capacity is set to Int.MaxValue. I see a bunch of sparse files. image

I tried CreateNew and didn't see a difference unfortunately. When I set capacity larger (e.g., Int.MaxValue * 2L), I get an unhelpful System.IO.IOException: Insufficient system resources exist to complete the requested service. If .NET CLR was running as a 32-bit process this would make sense. I'll check on this.

nickna commented 2 weeks ago

Sparse API call on Windows works reliably. When trying to go above ~2bn records, the files get obscenely large. This, in turn, causes Windows to return a generic System.IO exception.

2,147,483,648 records x 4,096 (max vector size) = 8,192 (8 TiB)

It makes sense to keep this limit until compression is evaluated.