unDocUMeantIt / koRpus

An R Package for Text Analysis
GNU General Public License v3.0
45 stars 6 forks source link

Error: english-lexicon.txt not found #26

Closed bzuck-temple closed 4 years ago

bzuck-temple commented 4 years ago

Hello, I am having trouble using Treetagger via koRpus. After looking for a while, I cannot seem to find the .txt file listed in the error below nor have I found a work around. Thanks for your help.

I am using the following versions - R: 3.6.1 koRpus: 0.13-3 Treetagger: Mac OSX 3.2.3

tagged.text <- treetag("unabomber_manifesto.txt", treetagger="manual", lang="en",
TT.options=list( path="~/Downloads/mytreetagger/bin ",preset="en"))

Error: None of the following files were found, please check your TreeTagger installation! /Users/tuc50262/Downloads/mytreetagger/bin/lib/english-lexicon.txt /Users/tuc50262/Downloads/mytreetagger/bin/lib/english-lexicon

unDocUMeantIt commented 4 years ago

hm, damn. i'll have to look into this.

in the meantime, the most straight forward workaround is probably to create an empty file by that name. that shouldn't hurt but make the error go away.

bzuck-temple commented 4 years ago

Yup this seemed to work. thanks for your help

I generally had issue with the mac download. I had to move the lib and cmd from the tagger-scripts folder into the folder containing the tagger package. You could not simply leave them in the folder they were downloaded as.

I originally ran into these two errors. My files were structured as Documents/treetagger/tree-tagger-MacOSX-3.2.3/tagger-scripts/lib

There is a cmd folder that comes with the mac download but it only contains lookup.pearl. Copying the cmd folder from the tagger scripts folder fixed the error below but then as we discussed above was still missing the english-lexicon.txt

Error: Specified directory cannot be found: ~/Documents/treetagger/tree-tagger-MacOSX-3.2.3/lib

and

Error: None of the following files were found, please check your TreeTagger installation! /Users/bzuck/Documents/treetagger/tree-tagger-MacOSX-3.2.3/cmd/utf8-tokenize.perl /Users/bzuck/Documents/treetagger/tree-tagger-MacOSX-3.2.3/cmd/tokenize.perl

unDocUMeantIt commented 4 years ago

from what you describe your TreeTagger setup is incorrect. the directory structure is supposed to look like this:

TreeTagger/  # the root directory can have any name
    bin/
    cmd/
    doc/
    lib/

this is also what TreeTagger's own scripts expect. if you use the install-tagger.sh script for installation, you should end up with a functioning setup. if not, you need to unpack the TreeTagger archive first, the tagger-scripts after that (into the TreeTagger folder) and then place the parameter files in the lib subdirectory.