microsoft / FASTER

Fast persistent recoverable log and key-value store + cache, in C# and C++.
https://aka.ms/FASTER
MIT License
6.29k stars 563 forks source link

FASTER.core.MallocFixedPageSize`1.InternalAllocate: System.IndexOutOfRangeException: Index was outside the bounds of the array #914

Closed SteveSyfuhs closed 4 months ago

SteveSyfuhs commented 4 months ago

Hey folks,

I have observed a consistently reproducible exception: FASTER.core.MallocFixedPageSize.InternalAllocate: System.IndexOutOfRangeException: Index was outside the bounds of the array

The gist of the behavior is when using FasterKV as a larger-than-memory dictionary for lookups in a sort of ETL-like process.

  1. Take file of given format and process it
  2. Load that file into FasterKV<long, DATA>
  3. Execute next stage that queries KV

It's fairly trivial in execution and doesn't rely on checkpointing or any other log mechanisms (IOW it doesn't appear related to https://github.com/microsoft/FASTER/issues/135). Unclear if it's my error or the library.

I'm observing the exception in the load stage where all it's doing is session.Upsert(key, value) after around 1.8 billion entries.

I have created a simple repro. The real run uses an object serializer and isn't storing Memory. The real run takes about 3 hours on my machine before it throws. This takes about half as long. The repro also takes up a good chunk of disk space (note temp directory).

The gist of the repro is in a background thread generate 4 billion entries of arbitrary size and shove them into a queue. Then queue up a bunch of threads and dequeue them and add into the KV store. The configuration settings were chosen somewhat arbitrarily based on what worked well with the real stage (1) of the process.

I don't believe it's a threading issue. It appears to repro with a single thread, though it takes forever to run (obviously).

Gist is using the latest nuget package. Also requires System.Threading.Channels. I have also confirmed this repro's against latest source in git. I have not verified if it repros against the garnet fork.

https://gist.github.com/SteveSyfuhs/dea565f48ca108e3fddb4bf6216123fa

Happy to provide more information or try out things.

--

Environment info:

Processor: Intel(R) Core(TM) i9-10900KF CPU @ 3.70GHz, 3696 Mhz, 10 Core(s), 20 Logical Processor(s)
RAM:       128 GB
.NET SDK:
 Version:           8.0.201
 Commit:            4c2d78f037
 Workload version:  8.0.200-manifests.bc6351c6

Runtime Environment:
 OS Name:     Windows
 OS Version:  10.0.26100
 OS Platform: Windows
 RID:         win-x64
 Base Path:   C:\Program Files\dotnet\sdk\8.0.201\
SteveSyfuhs commented 4 months ago

And for completeness here is the repro output:

[4/13/2024 12:14:34 PM] 1,880,000,000 (1,878,372,637) 00:00:53.8344933
System.IndexOutOfRangeException: Index was outside the bounds of the array.
   at FASTER.core.MallocFixedPageSize`1.InternalAllocate(Int32 blockSize)
   at ConsoleApp5.BigDictionary`2.Add(TKey key, TValue value) in C:\Users\SteveSyfuhs\source\repos\ConsoleApp5\ConsoleApp5\Program.cs:line 151
   at ConsoleApp5.Program.<>c__DisplayClass0_0.<<Main>b__1>d.MoveNext() in C:\Users\SteveSyfuhs\source\repos\ConsoleApp5\ConsoleApp5\Program.cs:line 53
TedHartMS commented 4 months ago

This is hitting an internal limit in MallocFixedPageSize. There are a lot of key collisions to the same HashBucket which requires allocating overflow buckets. Try setting FasterKVSettings.IndexSize higher (default is 1 << 26) or calling FasterKV.GrowIndex.

SteveSyfuhs commented 4 months ago

This is hitting an internal limit in MallocFixedPageSize. There are a lot of key collisions to the same HashBucket which requires allocating overflow buckets. Try setting FasterKVSettings.IndexSize higher (default is 1 << 26) or calling FasterKV.GrowIndex.

This did the trick! Closing.