opensearch-project / neural-search

Plugin that adds dense neural retrieval into the OpenSearch ecosytem
Apache License 2.0
66 stars 67 forks source link

Add integration test for reindex #386

Open ylwu-amzn opened 1 year ago

ylwu-amzn commented 1 year ago

We found some bug when reindex a non-KNN index to KNN index with neural-search ingest pipeline. We fixed this issue https://github.com/opensearch-project/ml-commons/pull/1418.

It will be best if neural-search can have some integration test to cover reindex. So we can catch such issue earlier.

heemin32 commented 1 year ago

The bug was in ml-commons. Ideally, test should be in ml-common imo. We don't expect every ml-common dependency plugin to add reindex integ test to catch a bug in ml-common?

ylwu-amzn commented 1 year ago

It will be challenging for ml-commons , as run reindex needs Knn and neural-search plugins

The bug happens when running reindex with neural search ingest pipeline.

heemin32 commented 1 year ago

Understand the difficulty to have reindex test in ml-commons but in a long term, we should consider implementing some very basic ingest pipeline inside ml-commons for integ test purpose.

navneet1v commented 1 year ago

@heemin32 I would beg to differ little bit here. As the ingest processor is defined in neural search plugin, ideally it is the job of the neural search plugin to validate all the cases in which that processor can be run. Now running the processor on re-index is another way to calling that processor. Hence the integration tests should be in neural search for re-index, because ML Commons cannot cover all the cases in which how the processor can be invoked.

Having said all the things above, one thing we definitely need from ML Commons Plugin or MLCommons client side, in terms of security how the apis can be used. This will ensure that its not the clients who are letting MLCommons know that something is broken.

heemin32 commented 1 year ago

Having said all the things above, one thing we definitely need from ML Commons Plugin or MLCommons client side, in terms of security how the apis can be used. This will ensure that its not the clients who are letting MLCommons know that something is broken.

My point align with this comment. MLCommons should be able to find if something is broken without relying on integ test in other plugins. Neural search should have re-index integ test to increase test coverage but MLCommons also should have a test case to cover the bug regarding this PR https://github.com/opensearch-project/ml-commons/pull/1418.

navneet1v commented 1 year ago

@heemin32

implementing some very basic ingest pipeline inside ml-commons for integ test purpose.

My comment was mainly around this.

heemin32 commented 1 year ago

@heemin32

implementing some very basic ingest pipeline inside ml-commons for integ test purpose.

My comment was mainly around this.

Just one of option to catch the bug inside ml-common itself.