kreeben / resin

Vector space index based search engine that's available as a HTTP service or as an embedded library.
MIT License
568 stars 40 forks source link

Benchmarks #48

Open SebastianStehle opened 7 years ago

SebastianStehle commented 7 years ago

Hi,

I am following Ayende's review and I am motivated to make some tests with Ants Profiler.

What do I have to do to run some benchmarks?

Btw: I found this free profiler: http://www.getcodetrack.com/

kreeben commented 7 years ago

Awsome!

In the Resin README there is a link to a JSON document I keep on Google Drive. Download it and follow the CLI instructions. The CLI is a bat file (rn.bat) in the root of the repo. It links to a dotnetcore runnable.

There isn't a test suite you can run. Only commands you can execute. You may construct a test that does reading and writing in parallel I suppose.

I've had issues finding a good profiler for .Net Core. I've used Ants Profiler but not while targettng Core. Let me know if you run into issues.

SebastianStehle commented 7 years ago

CodeTrack supports .net core. An alternative would be to publish your .NET Core App to Windows and profile it then with Ants Profiler.

SebastianStehle commented 7 years ago

I found the same things as Ayende when profiling the inserts with Redgate Profiler. The slowest parts are the Analyzer and IO.

I tried to refactor some stuff, but it is hard to understand, because of some structures. e.g. Words has postings, but they are not used.

I think it might be worth a try to maintain a global map of FieldName => FieldId. If you assume that the number of fields is limited (e.g. 2000) you could transform Documents from (Key => Value) to (FieldId => Value) very early in the process. And then just use fixed size arrays of structs for a lot of operations.

kreeben commented 7 years ago

@SebastianStehle wholy shit man, thank's for your donation of your free time!

Yes, IO (Stream.Seek) and allocations have been the two main culprits for poor performance and high memory usage at writing time. It's definitely a no-brainer what you and Ayende say about how I should deal with and gain from immutability, I should definitely take a closer look there.

I have refactored the shit out of the code since your benchmark. It's much more readable and leave less questions open regaring what the state of an object is in different scenarious. There is now a clear line between document database concerns and full-text search concerns. System.Linq is pretty much gone. It's a whole new experience. It's fresh! Check it!

kreeben commented 6 years ago

@SebastianStehle hey pal! I've done a complete re-write and so if you have absolutely nothing better to do, feel free to check it out: https://github.com/kreeben/resin/commit/5f85425a0f61bbfbe2b2676d71d72f37677a0bef