Closed thomfischer closed 4 years ago
There seems to be a known memory leak in spaCy 2.1.8 (https://github.com/explosion/spaCy/issues/3618), which has only been fixed with v 2.1.9
I will try to update blackstone manually to spaCy 2.2 for multi-core pipeline support and fixed memory leaks (see #50 ).
To reduce the memory footprint significantly, we additionally need to know which pipeline components are currently needed, so we can disable unnecessary parts (e.g. NER detection, when no NER detection is wanted see #51)
Updated requirements to spacy 2.1.9 ( 56e3e36) which fixed the memory leak present in spacy 2.1.8.
When trying to call CorpusAnalysis.init_pipeline() on the entire corpus,
self.corpus = textacy.Corpus(self.nlp, data=texts)
allocates 12GB of memory within 1 minute. This must absolutely be fixed if possible.