jina-ai / executors

internal-only
Apache License 2.0
31 stars 12 forks source link

test: add integration test for sync with delta #166

Closed cristianmtr closed 3 years ago

cristianmtr commented 3 years ago

Add an integration test to show off sync with deltas in PSQLFaissSearcher

cristianmtr commented 3 years ago

@tadejsv I can't reply to your comments.

The reason these tests are more complex

  1. They also act like showcases for how to full usage of these features.
  2. test_psql_import needs to be the same as test_dump_reload, in order to compare them in the benchmark

You are testing the case 'use_delta': True in this test. Testing normal sync without it should be done as a separate test

The use_delta only has meaning if you have already imported from a snapshot, since it will only give you the data from that last timestamp.

tadejsv commented 3 years ago

Ok, so if I understand things correctly, the issue is that this unit test is trying to be 3 things at once: a unit test, a showcase, and a benchmark. Why not separate this out?

The use_delta only has meaning if you have already imported from a snapshot, since it will only give you the data from that last timestamp.

My comment referred to the call to /serach endpoint after a normal sync, which is not really needed in that unit test. There can/should be another unit test testing normal sync.

P.S. For these comments you can reply under the original comments

cristianmtr commented 3 years ago

Ok, so if I understand things correctly, the issue is that this unit test is trying to be 3 things at once: a unit test, a showcase, and a benchmark. Why not separate this out?

  • Showcase -> readme

It won't be tested and thus will end up broken.

  • benchmark -> perhaps the benchmarks repo? Not sure in what way you are using this benchmark

That would create too much coupling. We want it as both an integration test here, but, as you can see, it can also be called to run in benchmark mode with a larger nr of docs and larger emebddings.

My comment referred to the call to /serach endpoint after a normal sync, which is not really needed in that unit test. There can/should be another unit test testing normal sync.

There is another unit test, in the PSQLFaissIndexer folder's tests. Yes, the assert on the search could be removed. It's more like a gradual test, making sure that every step is ok before proceeding. So it can catch error early instead of waiting until the last assert.

tadejsv commented 3 years ago

That would create too much coupling. We want it as both an integration test here, but, as you can see, it can also be called to run in benchmark mode with a larger nr of docs and larger emebddings.

Well, fair point on coupling. But in general - a piece of code should do one thing and do that well. This tries to do too many things.

There is another unit test, in the PSQLFaissIndexer folder's tests. Yes, the assert on the search could be removed. It's more like a gradual test, making sure that every step is ok before proceeding. So it can catch error early instead of waiting until the last assert.

Not just assert on the search, the whole search. There's no need to catch an error early - there should be other tests for that, so when something breaks, you just look at which tests fail and deduce the root cause from there. A test should focus on testing one thing - this makes them readable and easy to understand

cristianmtr commented 3 years ago

Not just assert on the search, the whole search. There's no need to catch an error early - there should be other tests for that, so when something breaks, you just look at which tests fail and deduce the root cause from there. A test should focus on testing one thing - this makes them readable and easy to understand

Ok, I'll remove that assert.

Well, fair point on coupling. But in general - a piece of code should do one thing and do that well. This tries to do too many things.

Could be refactored later. Right now it would mean also changing the other benchmark tests (which would be a bit too much for this PR)