Closed AlexIls closed 5 years ago
Hey, thanks for raising this issue.
Currently dtm_compare() is indeed not exported, but is called by the tCorpus compare_corpus method (tc$compare_corpus). I'm currently doing some redesigning to make this a function. I hadn't actually considered that people might want to use dtm_compare directly, but it indeed makes sense, so I'll export it in the next update.
Note that tc$preprocess actually doesn't return a DTM. Our reason for developing corpustools is to stick as much as possible to a tokenlist format, that remembers the positions of tokens, and allows NLP output (POS tags, dependency relations, etc) to be contained. The preprocess method adds a column to the tokenlist with the specified name (in this case 'feature'). You can see this by running tc$tokens, which accesses the tokenlist data.table.
You can then use this column in a corpus comparison. For this, please view the documentation for ?compare_corpus or ?compare_subset (compares a subset of the corpus to the rest of the corpus).
I really need to write a vignette. The best way to see how corpustools works is currently to run ?tCorpus for the documentation hub page. For reference, you can create a dtm with tc$dtm or tc$dfm (for a quanteda document feature matrix).
Hey, Thanks for writing the package -- it looks very promising! However, I think you might have forgotten to export the dtm_compare function, as I get an error when calling it.
Error messages:
Steps to reproduce: