jina-ai / serve

☁️ Build multimodal AI applications with cloud-native stack
https://jina.ai/serve
Apache License 2.0
21.13k stars 2.22k forks source link

Change parameters after indexing. (e.g. for Hyperparameter-search) #1515

Closed janandreschweiger closed 3 years ago

janandreschweiger commented 3 years ago

Hey Jina team,

my team has indexed over 100.000 text documents using the FaissIndexer. Now we would like to test different hyperparamters:

Unfortunately, if we now make any changes to these parameters in our yml-file nothing changes. This is probably, because the indexer is saved and loaded again. For testing different hyperparamters it is however very unpleasant, as we would have to index all documents again for every setup. Indexing all documents takes several hours.

We tried to overwrite the FaissIndexer, but jina doesn't recognize custom executors that were added after index-time:

could not determine a constructor for the tag '!CustomFaissIndexer'

Is there a way to alter these paramers for querying?

JoanFM commented 3 years ago

Hey Jina team,

my team has indexed over 100.000 text documents using the FaissIndexer. Now we would like to test different hyperparamters:

  • distance: "l2" vs "inner_product"
  • normalize: True vs False

Unfortunately, if we now make any changes to these parameters in our yml-file nothing changes. This is probably, because the indexer is saved and loaded again. For testing different hyperparamters it is however very unpleasant, as we would have to index all documents again for every setup. Indexing all documents takes several hours.

We tried to overwrite the FaissIndexer, but jina doesn't recognize custom executors that were added after index-time:

could not determine a constructor for the tag '!CustomFaissIndexer'

Is there a way to alter these paramers for querying?

Hey @janandreschweiger, thank you again for your valuable feedback.

This is possible using ref_indexer as a composite for a NumpyIndexer and wrapping it with a FaissIndexer or any other Indexer with a different set of parameters.

This has some troubles and will be addressed in #1438.

By the way, we are working in a feature to provide hyperparameter optimization so please stay tuned!

janandreschweiger commented 3 years ago

Thanks for your reply @JoanFM! That's cool, especially if one wants to change the parameters in production. Also hyperparameter tuning is important for many applications.

JoanFM commented 3 years ago

Hey @janandreschweiger .

Now that I think this should be already available for you since you are not using FaissIndexer from a Docker Image.

What you need to do is to have at Index time a NumpyIndexer as indexer.

Then at Query time, you can have a FaissIndexer that gets as a ref_indexer a NumpyIndexer with the parameters of the one at Index time, that is from where the FaissIndexer will load the data.

You can see this feature being used (not successful now because it uses Containers) in the faiss search example.

It may help you to go ahead of the issue in #1438

janandreschweiger commented 3 years ago

Thanks @JoanFM I'll try it out!

JoanFM commented 3 years ago

Hey @janandreschweiger ,

There is a PR open in the examples that showcases what you are trying to do.

https://github.com/jina-ai/examples/pull/318

It lets you index with an indexer, and then reuse that index data to query with different indexers types and parameters.

I hope you find it useful