Compile a list of various corpora from different domains, for evaluation of implemented algorithms against other popular topic modelling techniques. Ideally, selected corpus should have a date-like field for evaluation of temporal aware topic modelling techniques as well. For convenience, to then implement a simple downloader that will load and transform the various corpora to a standardised format.
Description
Compile a list of various corpora from different domains, for evaluation of implemented algorithms against other popular topic modelling techniques. Ideally, selected corpus should have a date-like field for evaluation of temporal aware topic modelling techniques as well. For convenience, to then implement a simple downloader that will load and transform the various corpora to a standardised format.