CorpusAnalysis: per_doc analysis not assigned to document IDs

phHartl / eu-judgement-analyse

Quantitative analysis of judgments of the European Court of Justice

MIT License

6 stars 0 forks source link

CorpusAnalysis: per_doc analysis not assigned to document IDs #45

Closed thomfischer closed 4 years ago

thomfischer commented 4 years ago

When performing a per-doc-analysis on a corpus, like get_tokens_per_doc, the result is a list of a list of all tokens, without any indicator, to which document each result list belongs. This should probably be changed to a list of dicts containing a celex and result key, to ensure each result is linked to its original doc.

phHartl commented 4 years ago

With c5ddf80 all CELEX numbers are stored during the calculations and now can be accessed via _get_celex_numbers. Changing all functions to include a dict instead of a simple list might also be a possibility but further increases the memory footprint of the analysis part.