meilisearch / milli

Search engine library for Meilisearch ⚡️
MIT License
464 stars 82 forks source link

Move hard and soft in ExternalDocumentsIds into single map #666

Closed msvaljek closed 1 year ago

msvaljek commented 1 year ago

Move hard and soft in ExternalDocumentsIds into single map docids

Related issue

Fixes #76

What does this PR do?

PR checklist

Please check if your PR fulfills the following requirements:

Thank you so much for contributing to Meilisearch!

msvaljek commented 1 year ago

I'm not sure how to do the benchmark.

I tried https://github.com/meilisearch/milli/tree/main/benchmarks#on-your-machine

The first time it runs, for further iterations you need to change the code for it to actualy do something in the cargo benchmark - I would like to update the readme here so that the newbies don't get confused.

There isn't any json files being generated or something similar on my machine, the tests just finish with:

test result: ok. 0 passed; 0 failed; 162 ignored; 0 measured; 0 filtered out; finished in 0.00s

I would like to understand how this works too (and open prs to document since I'm participating in hacroberfest)

after running the benchmarks, I don't notice any new json file locally and basically the list.sh script is just showing stuff available in: https://milli-benchmarks.fra1.digitaloceanspaces.com

Also I don't have a possibility to trigger actions or something on the repo

curquiza commented 1 year ago

bors try

bors[bot] commented 1 year ago

try

Build failed:

curquiza commented 1 year ago

Hello @msvaljek If you are still around, can you let us know if you plan to finish the PR? Indeed, the tests still fail and we cannot merge or even review it more before it's fixed 😊

msvaljek commented 1 year ago

Hello @msvaljek If you are still around, can you let us know if you plan to finish the PR? Indeed, the tests still fail and we cannot merge or even review it more before it's fixed 😊

I'm still working on this, I'll try to carve out some time latest this weekend.

msvaljek commented 1 year ago

It looks pretty good to me. Thank you very much for that.

I am just wondering if you could send a dataset in multiple batches multiple times and see the time difference between the previous version and the new one. I am sure the recently introduced soft-deletion feature will amortize it, but who knows?

I already explored and I don't think I have a permission to run this.