This PR enables the import of encoders into Elasticsearch via eland.
Eland DataFrame
Eland allows us to analyze indexed docs with the APIs compatible with Pandas. It appears that eland maps pandas APIs into Elasticsearch APIs as implemented here.
Since eland retrieves data from Elasticsearch, any excluded columns are not retrieved. In this project, the field product_vector is defined as an excluded field (mappings._source.excludes: "product_vector"), which results in the values being NaN as shown below.
>>> import eland as ed
>>> df = ed.DataFrame("http://localhost:9200", es_index_pattern="products_jp")
>>> df
product_brand ... product_vector
B07R91TVJB ... NaN
B00FW60P84 ... NaN
B071KNF11Z SHD-PB ... NaN
B07HR6BC88 ... NaN
B089Q21H6Z Meize ... NaN
... ... ... ...
B078GHPC9T ティーケーカンパニー (TK.Company) ... NaN
B07KYY329D TRAVELIST(トラベリスト) ... NaN
B07T15VXM2 ... NaN
B07YD6R7M3 KIZUNA ... NaN
B083JF93J5 Tbmodel ... NaN
[100 rows x 10 columns]
Machine Learning with Eland
Another notable feature of eland is that it provides APIs for importing and executing machine learning models. Combining eland with recently added features enables more flexible vector search within Elasticsearch.
AuthorizationException: current license is non-compliant for [ml]
I got this error when I tried to import a model. It turns out that importing Pytorch models is a platinum-licensed feature.
```shell
$ poetry run inv es.import-model
Traceback (most recent call last):
File "/Users/kentaro-takiguchi/projects/amazon-product-search/.venv/bin/inv", line 8, in
sys.exit(program.run())
File "/Users/kentaro-takiguchi/projects/amazon-product-search/.venv/lib/python3.10/site-packages/invoke/program.py", line 384, in run
self.execute()
File "/Users/kentaro-takiguchi/projects/amazon-product-search/.venv/lib/python3.10/site-packages/invoke/program.py", line 569, in execute
executor.execute(*self.tasks)
File "/Users/kentaro-takiguchi/projects/amazon-product-search/.venv/lib/python3.10/site-packages/invoke/executor.py", line 129, in execute
result = call.task(*args, **call.kwargs)
File "/Users/kentaro-takiguchi/projects/amazon-product-search/.venv/lib/python3.10/site-packages/invoke/tasks.py", line 127, in __call__
result = self.body(*args, **kwargs)
File "/Users/kentaro-takiguchi/projects/amazon-product-search/tasks/es_tasks.py", line 50, in import_model
ptm.import_model(model_path=model_path, config_path=None, vocab_path=vocab_path, config=config)
File "/Users/kentaro-takiguchi/projects/amazon-product-search/.venv/lib/python3.10/site-packages/eland/ml/pytorch/_pytorch_model.py", line 122, in import_model
self.put_config(path=config_path, config=config)
File "/Users/kentaro-takiguchi/projects/amazon-product-search/.venv/lib/python3.10/site-packages/eland/ml/pytorch/_pytorch_model.py", line 78, in put_config
self._client.ml.put_trained_model(model_id=self.model_id, **config_map)
File "/Users/kentaro-takiguchi/projects/amazon-product-search/.venv/lib/python3.10/site-packages/elasticsearch/_sync/client/utils.py", line 414, in wrapped
return api(*args, **kwargs)
File "/Users/kentaro-takiguchi/projects/amazon-product-search/.venv/lib/python3.10/site-packages/elasticsearch/_sync/client/ml.py", line 3301, in put_trained_model
return self.perform_request( # type: ignore[return-value]
File "/Users/kentaro-takiguchi/projects/amazon-product-search/.venv/lib/python3.10/site-packages/elasticsearch/_sync/client/_base.py", line 389, in perform_request
return self._client.perform_request(
File "/Users/kentaro-takiguchi/projects/amazon-product-search/.venv/lib/python3.10/site-packages/elasticsearch/_sync/client/_base.py", line 320, in perform_request
raise HTTP_EXCEPTIONS.get(meta.status, ApiError)(
elasticsearch.AuthorizationException: AuthorizationException(403, 'security_exception', 'current license is non-compliant for [ml]')
```
I started the trial by calling the following API.
```
$ curl -X POST http://localhost:9200/_license/start_trial?acknowledge=true
```
It costs $125 per month ([Official Elasticsearch Pricing: Elastic Cloud, Managed Elasticsearch | Elastic](https://www.elastic.co/pricing/)).
Encoding Queries at Query Time
As of Elasticsearch 8.7, encoders can be executed at query time.
Summary
This PR enables the import of encoders into Elasticsearch via eland.
Eland DataFrame
Eland allows us to analyze indexed docs with the APIs compatible with Pandas. It appears that eland maps pandas APIs into Elasticsearch APIs as implemented here.
Since eland retrieves data from Elasticsearch, any excluded columns are not retrieved. In this project, the field
product_vector
is defined as an excluded field (mappings._source.excludes: "product_vector"
), which results in the values being NaN as shown below.Machine Learning with Eland
Another notable feature of eland is that it provides APIs for importing and executing machine learning models. Combining eland with recently added features enables more flexible vector search within Elasticsearch.
Importing Encoders
I have added a task for importing models via eland.
This is equivalent to the below command.
AuthorizationException: current license is non-compliant for [ml]
I got this error when I tried to import a model. It turns out that importing Pytorch models is a platinum-licensed feature. ```shell $ poetry run inv es.import-model Traceback (most recent call last): File "/Users/kentaro-takiguchi/projects/amazon-product-search/.venv/bin/inv", line 8, inEncoding Queries at Query Time
As of Elasticsearch 8.7, encoders can be executed at query time.
Index docs:
Retrieve docs using the imported model: