nickna / Neighborly

An open-source vector database
MIT License
9 stars 2 forks source link

chore: Optimize Vector to binary perf and vice-versa #36

Closed hangy closed 3 weeks ago

hangy commented 3 weeks ago

šŸ“ Description

Refactors the Vector class to use Span<byte> in ToBinary method and the byte[] based ctor.

šŸ”— Related Issues

No existing issue. Addresses a TODO comment that used to be here: https://github.com/nickna/Neighborly/blob/1ddf9e10a04f8318a012c7003811c16ea5238400/Neighborly/Vector.cs#L263

šŸ’” Additional Notes

Benchmarks

I did some benchmarks to check that everything works. Both benchmarks used a common class to generate vectors.

RandomVectorGenerator ```csharp using Neighborly; namespace Benchmarks; internal sealed class RandomVectorGenerator { private static readonly Random _random = Random.Shared; public static Vector CreateRandomVector(int size) { return new(GetRandomFloats(size).ToArray(), new string('X', size)); } private static IEnumerable GetRandomFloats(int count) { for (int i = 0; i < count; i++) { yield return _random.NextSingle(); } } } ```

VectorBinaryCtor

The byte array based ctor didn't work before this change, but the BinaryReader one did, so I benchmarked the new one against the BinaryReader one, which isn't completely fair. The Span<byte> based ctor outperforms the reader by a little bit.

VectorBinaryCtor.cs ```csharp using BenchmarkDotNet.Attributes; using Neighborly; namespace Benchmarks; [MemoryDiagnoser] public class VectorBinaryCtor { private static readonly byte[] _smallVector = RandomVectorGenerator.CreateRandomVector(10).ToBinary(); private static readonly byte[] _mediumVector = RandomVectorGenerator.CreateRandomVector(768).ToBinary(); private static readonly byte[] _largeVector = RandomVectorGenerator.CreateRandomVector(1536).ToBinary(); private static readonly BinaryReader _smallReader = new(new MemoryStream(_smallVector)); private static readonly BinaryReader _mediumReader = new(new MemoryStream(_mediumVector)); private static readonly BinaryReader _largeReader = new(new MemoryStream(_largeVector)); [IterationSetup] public void Setup() => _smallReader.BaseStream.Position = _mediumReader.BaseStream.Position = _largeReader.BaseStream.Position = 0; [Benchmark] public Vector FromArraySmallNew() => new(_smallVector); [Benchmark] public Vector FromArraySmallOld() => new(_smallVector, old: true); [Benchmark] public Vector FromArraySmallReader() => new(_smallReader); [Benchmark] public Vector FromArrayMediumNew() => new(_mediumVector); [Benchmark] public Vector FromArrayMediumOld() => new(_mediumVector, old: true); [Benchmark] public Vector FromArrayMediumReader() => new(_mediumReader); [Benchmark] public Vector FromArrayLargeNew() => new(_largeVector); [Benchmark] public Vector FromArrayLargeOld() => new(_largeVector, old: true); [Benchmark] public Vector FromArrayLargeReader() => new(_largeReader); } ```
Method Mean Error StdDev Median Allocated
FromArraySmallNew 2.404 us 0.1623 us 0.4759 us 2.310 us 928 B
FromArraySmallOld NA NA NA NA NA
FromArraySmallReader 3.319 us 0.1739 us 0.5072 us 3.220 us 1008 B
FromArrayMediumNew 15.600 us 1.2486 us 3.6620 us 16.945 us 5472 B
FromArrayMediumOld NA NA NA NA NA
FromArrayMediumReader 28.389 us 1.3307 us 3.9237 us 29.576 us 6304 B
FromArrayLargeNew 20.910 us 1.2437 us 3.6280 us 21.181 us 10080 B
FromArrayLargeOld NA NA NA NA NA
FromArrayLargeReader 37.588 us 1.4519 us 4.1657 us 38.341 us 11680 B

VectorToBinary

I hope I didn't miss any obvious issue, as it's suspiciously fast. šŸ˜…

VectorToBinary.cs ```csharp using BenchmarkDotNet.Attributes; using Neighborly; namespace Benchmarks; [MemoryDiagnoser] public class VectorToBinary { private static readonly Vector _smallVector = RandomVectorGenerator.CreateRandomVector(10); private static readonly Vector _mediumVector = RandomVectorGenerator.CreateRandomVector(768); private static readonly Vector _largeVector = RandomVectorGenerator.CreateRandomVector(1536); [Benchmark] public byte[] ToBinarySmallNew() => _smallVector.ToBinary(); [Benchmark] public byte[] ToBinarySmallOld() => _smallVector.ToBinaryOld(); [Benchmark] public byte[] ToBinaryMediumNew() => _mediumVector.ToBinary(); [Benchmark] public byte[] ToBinaryMediumOld() => _mediumVector.ToBinaryOld(); [Benchmark] public byte[] ToBinaryLargeNew() => _largeVector.ToBinary(); [Benchmark] public byte[] ToBinaryLargeOld() => _largeVector.ToBinaryOld(); } ```
Method Mean Error StdDev Gen0 Gen1 Allocated
ToBinarySmallNew 57.20 ns 0.352 ns 0.330 ns 0.0020 - 104 B
ToBinarySmallOld 396.78 ns 1.689 ns 1.580 ns 0.0210 - 1064 B
ToBinaryMediumNew 1,900.81 ns 13.447 ns 12.578 ns 0.0763 - 3896 B
ToBinaryMediumOld 12,504.94 ns 49.681 ns 46.472 ns 0.6409 - 32896 B
ToBinaryLargeNew 3,813.04 ns 31.220 ns 29.203 ns 0.1526 - 7736 B
ToBinaryLargeOld 24,809.69 ns 43.600 ns 38.651 ns 1.2817 0.0305 65152 B