Open phHartl opened 4 years ago
Could be done by keyword extraction.
Added keyword extraction with 8befa59. N-grams are already present for corpora and single documents. Topic modeling might be the way to go for corpora, but this will take a lot of computational time, so we should pre-compute those for specific corpora.
Part of the content extraction process is also named entity recognition (which has already been implemented).
I'm currently thinking about implementing a summary algorithm for single documents (transformer models) or topic modeling for a bigger corpus.
What is each judgement about?