Closed tfnribeiro closed 1 week ago
Currently, the mysql_to_elastic.py is adding elements to the Index sequentially which is quite a slow process, especially if we have to index a large amount of documents.
mysql_to_elastic.py
I have made an helper script which utilizes the bulk function from https://elasticsearch-py.readthedocs.io/en/7.x/helpers.html#elasticsearch.helpers.bulk which speeds this process quite a lot and also allows to select articles based on a sqlalchemy query (say we only care about the Danish articles being indexed for testing).
The script also allows deleting the index (in case we want to re-index).
I think this could be useful to have also in the Repo just to speed up the indexing for new developers.
sounds good!
This has been implemented as part of ES8.
Currently, the
mysql_to_elastic.py
is adding elements to the Index sequentially which is quite a slow process, especially if we have to index a large amount of documents.I have made an helper script which utilizes the bulk function from https://elasticsearch-py.readthedocs.io/en/7.x/helpers.html#elasticsearch.helpers.bulk which speeds this process quite a lot and also allows to select articles based on a sqlalchemy query (say we only care about the Danish articles being indexed for testing).
The script also allows deleting the index (in case we want to re-index).
I think this could be useful to have also in the Repo just to speed up the indexing for new developers.