Closed thomasrenkert closed 4 years ago
hi thomas,
try ?query
, e.g. assuming your tagged object is called your_text
:
# filter by word class
query(your_text, var="wclass", query="name")
# or by POS tag
query(your_text, var="tag", query="NP")
Hi, thanks for your quick reply!
I get the error
Invalid var for class kRp.tagged: tag
which version of koRpus are you using? there were bugs in query()
fixed in 0.12-1.
I've tested it with the latest CRAN version and also with the development version from github. The error persists.
that's odd, i can't reproduce the issue. could you please
give some environmental data on your setup (e.g., operating system, versions of R & koRpus)
post the relevant code blocks you are running (i guess it is not related to the particular text you are tagging)
It works now, but only with the development versions from github and only when installing sylly separately.
library(devtools)
install_github("unDocUMeantIt/sylly", ref="develop")
install_github("unDocUMeantIt/koRpus", ref="develop")
library(koRpus)
install.koRpus.lang(lang=c("en", "de"))
library(koRpus.lang.de)
tagged_corpus <- treetag(
"corpus.txt",
treetagger="/opt/treetagger/cmd/tree-tagger-german",
lang="de"
)
names_corpus <- query(tagged_corpus, var="wclass", query="name")
yes, the development version is the forthcoming 0.13 release which has drastic changes under the hood compared to 0.12, which in turn already was a huge step from 0.11-5 (CRAN). the object classes are totally redesigned and the package depends on minor changes done to sylly, that's why you must use its develop branch as well. usage didn't change so much, it's just the internals.
0.12 was like an interim release, that's why i didn't push it to CRAN but wait for 0.13 to be ready instead. if you encounter any issues, let me know. i think it is rather stable and safe to use already.
btw, i'd recommend to try the presets, e.g.
set.kRp.env(
TT.cmd="manual",
TT.options=list(
path="/opt/treetagger",
preset="de"
),
lang="de"
)
tagged_corpus <- treetag("corpus.txt")
I know how to extract proper nouns from a corpus in quanteda with spacyr. But for another corpus I need to use treetagger. I was able to lemmatize the corpus with koRpus and treetagger, but I don't know how to further analyze word forms and parts of speech. For instance, I would like to get a list of all proper nouns within the corpus. How can I do that in koRpus?