Closed thomfischer closed 4 years ago
With c5ddf80 all CELEX numbers are stored during the calculations and now can be accessed via _get_celex_numbers. Changing all functions to include a dict instead of a simple list might also be a possibility but further increases the memory footprint of the analysis part.
When performing a per-doc-analysis on a corpus, like
get_tokens_per_doc
, the result is a list of a list of all tokens, without any indicator, to which document each result list belongs. This should probably be changed to a list of dicts containing acelex
andresult
key, to ensure each result is linked to its original doc.