vi3k6i5 / flashtext

Extract Keywords from sentence or Replace keywords in sentences.
MIT License
5.58k stars 598 forks source link

Over memory error #105

Closed freedom-wy closed 4 years ago

freedom-wy commented 4 years ago

i have a keyword list,376MB memory occupied。 when i add_keywords_from_list(dataList) it's 4.697g memory occupied How to improve?

thakur-nandan commented 4 years ago

Hi @freedom-wy,

May I know approximately the length of the dataList which has around 376MB Memory Occupied? Also, In my experience, I have never dealt with such a big keyword list. To explain your issue, since Flashtext implements a trie dictionary structure, to store all these keywords in a dictionary character by character (https://en.wikipedia.org/wiki/Trie) would be memory consuming definitely (Should go in GB's of data).

What I can think as a novice approach probably you can try is chunking your keywords into smaller lists and then use flashtext to extract from a smaller keyword list. Using this approach would help you consume less space, whereas the limitation would be you would have to run flashtext for your use-case multiple times over each chunk separately. Let me know if this sounds feasible.

Maybe, @vi3k6i5 has another intuitive approach that might help here?

Kind Regards, Nandan Thakur

freedom-wy commented 4 years ago

Can I use flashtext load the database? sqlite3 database in database have 1 million 200 thousand data.

thakur-nandan commented 4 years ago

@freedom-wy not directly using loading the database, but you can retrieve 100k keywords at a time from the database and load them in the flashtext, and use flash text for your use case.

Iterate this until you use all your keywords from your database. At the moment, this is the best solution which I can think of. I've never handled so many keys at once (1mil+ keywords).

Kind Regards, Nandan Thakur

freedom-wy commented 4 years ago

thank you.