zeeguu / api

API for tracking a learner's progress when reading materials in a foreign language and recommending further personalized exercises and readings.
https://zeeguu.org
MIT License
8 stars 24 forks source link

Update mysql_to_elastic.py to use Bulk #234

Closed tfnribeiro closed 1 week ago

tfnribeiro commented 1 month ago

Currently, the mysql_to_elastic.py is adding elements to the Index sequentially which is quite a slow process, especially if we have to index a large amount of documents.

I have made an helper script which utilizes the bulk function from https://elasticsearch-py.readthedocs.io/en/7.x/helpers.html#elasticsearch.helpers.bulk which speeds this process quite a lot and also allows to select articles based on a sqlalchemy query (say we only care about the Danish articles being indexed for testing).

The script also allows deleting the index (in case we want to re-index).

I think this could be useful to have also in the Repo just to speed up the indexing for new developers.

mircealungu commented 1 month ago

sounds good!

tfnribeiro commented 1 week ago

This has been implemented as part of ES8.