phHartl / eu-judgement-analyse

Quantitative analysis of judgments of the European Court of Justice
MIT License
6 stars 0 forks source link

CorpusAnalysis: per_doc analysis not assigned to document IDs #45

Closed thomfischer closed 3 years ago

thomfischer commented 3 years ago

When performing a per-doc-analysis on a corpus, like get_tokens_per_doc, the result is a list of a list of all tokens, without any indicator, to which document each result list belongs. This should probably be changed to a list of dicts containing a celex and result key, to ensure each result is linked to its original doc.

phHartl commented 3 years ago

With c5ddf80 all CELEX numbers are stored during the calculations and now can be accessed via _get_celex_numbers. Changing all functions to include a dict instead of a simple list might also be a possibility but further increases the memory footprint of the analysis part.