Closed freedom-wy closed 4 years ago
Hi @freedom-wy,
May I know approximately the length of the dataList which has around 376MB Memory Occupied? Also, In my experience, I have never dealt with such a big keyword list. To explain your issue, since Flashtext implements a trie dictionary structure, to store all these keywords in a dictionary character by character (https://en.wikipedia.org/wiki/Trie) would be memory consuming definitely (Should go in GB's of data).
What I can think as a novice approach probably you can try is chunking your keywords into smaller lists and then use flashtext to extract from a smaller keyword list. Using this approach would help you consume less space, whereas the limitation would be you would have to run flashtext for your use-case multiple times over each chunk separately. Let me know if this sounds feasible.
Maybe, @vi3k6i5 has another intuitive approach that might help here?
Kind Regards, Nandan Thakur
Can I use flashtext load the database? sqlite3 database in database have 1 million 200 thousand data.
@freedom-wy not directly using loading the database, but you can retrieve 100k keywords at a time from the database and load them in the flashtext, and use flash text for your use case.
Iterate this until you use all your keywords from your database. At the moment, this is the best solution which I can think of. I've never handled so many keys at once (1mil+ keywords).
Kind Regards, Nandan Thakur
thank you.
i have a keyword list,376MB memory occupied。 when i add_keywords_from_list(dataList) it's 4.697g memory occupied How to improve?