Closed stefan-mueller closed 7 years ago
hi stefan,
funny, i'll meet ken later this week and guess we'll be talking a lot about compatibility between packages ;-)
that said, there's currently no clearly defined path to combining our packages. but since you're about to be using the treetag()
function, you can use the option format="obj"
to directly tag text in a character vector. as long as you can get that out of a given object, you can tag it. if you need the result in a data.frame format, call taggedText()
on the result (the develop branch also supports subsetting and replacements using [
and [[
).
the tm.plugin.koRpus package extends koRpus
' possibilities to analyze a full corpus instead of single texts.
Great, thanks a lot for your reply. I heard about the get-together, and hope you come up with ideas of how to combine the strengths of each package.
I will follow your guidelines above and try to come up with a MWE that tags text either from a tm or quanteda corpus. We might add this to your vignette afterwards – I'm probably not the only one who faces this problem. What I need is a unique identifier for each tagged document (in my case: sentence) in the resulting data frame because I need to count the occurrences of certain POS tags for each document. I got this working with spacyr already, and I will try to come up with a solution using koRpus, and get back to you afterwards.
try summary()
on a tagged object.
looks like this is resolved for now.
First of all, thanks for developing this package!
I am currently working on POS tagging of text corpora in several languages. For German and English I use a combination of spacyr and quanteda..
For additional languages, I would like to use a koRpus and the TreeTagger. Is there a way to perform POS tagging directly on the text field in a corpus object? Or do you have a script that extracts the text field of a corpus for each document and applies POS tagging in a loop?
Thanks a lot for your help. Stefan