openai / deeptype

Code for the paper "DeepType: Multilingual Entity Linking by Neural Type System Evolution"
https://arxiv.org/abs/1802.01021
Other
647 stars 147 forks source link

load_wikidata_ids tries to build a marisatrie.RecordTrie. Python runs out of memory. #61

Closed saminahbab closed 2 years ago

saminahbab commented 3 years ago

Hello, just running the file get_wikiname_to_wikidata.py and it is in the stage where it tries to build a RecordTrie. I believe that my computer (16gb with 17gb swap space) cuts out of memory. Given that the wikidata_ids.txt file is around 800mb, Can I ask how big you expect the trie to become? Also can I ask the reason for a trie object? Maybe I am misunderstanding something but it seems to me that the MarisaAsDict is performing the same function here that a dictionary could do but with substantially more space? Just trying to look for ways around this as the wikidata_ids spans to nearly 100m.