meilisearch / milli

Search engine library for Meilisearch ⚡️

MIT License

464 stars 82 forks source link

Move hard and soft in ExternalDocumentsIds into single map #666

Closed msvaljek closed 1 year ago

msvaljek commented 1 year ago

Move hard and soft in ExternalDocumentsIds into single map docids

Related issue

Fixes #76

What does this PR do?

as described in the issue comments removes hard and soft and replaces with a single map called docids

PR checklist

Please check if your PR fulfills the following requirements:

[x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
[x] Have you read the contributing guidelines?
[x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!

msvaljek commented 1 year ago

I'm not sure how to do the benchmark.

I tried https://github.com/meilisearch/milli/tree/main/benchmarks#on-your-machine

The first time it runs, for further iterations you need to change the code for it to actualy do something in the cargo benchmark - I would like to update the readme here so that the newbies don't get confused.

There isn't any json files being generated or something similar on my machine, the tests just finish with:

test result: ok. 0 passed; 0 failed; 162 ignored; 0 measured; 0 filtered out; finished in 0.00s

I would like to understand how this works too (and open prs to document since I'm participating in hacroberfest)

after running the benchmarks, I don't notice any new json file locally and basically the list.sh script is just showing stuff available in: https://milli-benchmarks.fra1.digitaloceanspaces.com

Also I don't have a possibility to trigger actions or something on the repo

curquiza commented 1 year ago

bors try

bors[bot] commented 1 year ago

try

Build failed:

Tests on ubuntu-20.04

curquiza commented 1 year ago

Hello @msvaljek If you are still around, can you let us know if you plan to finish the PR? Indeed, the tests still fail and we cannot merge or even review it more before it's fixed 😊

msvaljek commented 1 year ago

Hello @msvaljek If you are still around, can you let us know if you plan to finish the PR? Indeed, the tests still fail and we cannot merge or even review it more before it's fixed 😊

I'm still working on this, I'll try to carve out some time latest this weekend.

msvaljek commented 1 year ago

It looks pretty good to me. Thank you very much for that.

I am just wondering if you could send a dataset in multiple batches multiple times and see the time difference between the previous version and the new one. I am sure the recently introduced soft-deletion feature will amortize it, but who knows?

I already explored and I don't think I have a permission to run this.