Closed CorentinWicht closed 4 years ago
that's odd, there is no call to path.expand()
in the koRpus
package. can you try with tokenize()
instead of treetag()
?
that's odd, there is no call to
path.expand()
in thekoRpus
package. can you try withtokenize()
instead oftreetag()
?
Many thanks for your reply, the following command indeed works:
TESTToken = tokenize("GNGResp.txt", lang="fr", detect=c(parag=TRUE, hline=TRUE))
The problem is that I am looking for a tool to only extract nouns from a open-ended question in psychology research. Are there any alternatives to TreeTagger?
Best,
Corentin
I think there path.expand
occured because I provided the path in the form C:/TreeTagger
while it seems to be looking a relative path like ~/TreeTagger
The problem is that my working directory is in the D:\ drive and Treetagger is installed in C:
i have no idea where path.expand()
comes into play here, because like i said i can't find that call anywhere in the package's source code. i don't think any ~
is expanded here if you don't set it.
if you are using RStudio you could try running the commands in a bare R session to check whether the GUI does something unexpected (i am using RKWard).
another approach would be to replace the string "C:/TreeTagger"
with file.path("C:", "TreeTagger")
and see if that has any effect. does dir.exists("C:/TreeTagger")
return TRUE
?
Many thanks for your prompt reply.
I may have found a lead, when knitting the Rmarkdown script I am using I got a more detailed explanations of the error:
I read something about a same issue in another package that was fixed here. This is the function normalize_path that should be used to replace normalizePath. This seems to be a Windows-specific error...
Based on your reply, I tried with file.path("C:", "TreeTagger")
but the same error appeared while dir.exists("C:/TreeTagger")
indeed returned TRUE
.
I also tried changing the path of the R_USER environment
global variable to C:\ and then setting path="~/TreeTagger"
in the function treetag
, but again the same error popped-up.
If the solution I provided you doesn't work, I will give it a try with RKWard, thanks.
good catch. if normalizePath()
is the culprit, that's good to know.
i'm always hesitant to adding new dependencies to a package, but looking more closely at xfun::normalize_path()
, its main difference from normalizePath()
seems to be the default setting for winslash
: it's "/"
instead of "\\"
. this could also be set in all instances of normalizePath()
in koRpus
' functions, as the option is only used on windows machines.
to make sure we're on the right track, does normalizePath("C:/TreeTagger")
give you the same error, and can it be avoided by normalizePath("C:/TreeTagger", winslash="/")
? or is it even possible to work around this by trying "C:\\TreeTagger"
?
Well I am now unsure it is the only culprit since,
normalizePath("C:/TreeTagger")
doesn't return any error, it does return "[1] C:\\TreeTagger"
all these variations return the same error as ealier:
path = normalizePath("C:/TreeTagger", winslash="/")
path = normalizePath("C:/TreeTagger", winslash="\\")
path="C:\\TreeTagger"
path="C://TreeTagger"
I tried another trick by defining the settings using the set.krp.env
function as following (this works!):
set.kRp.env(TT.cmd="C:\\TreeTagger\\bin\\tag-french.bat", lang="fr", preset="fr", treetagger="manual", format="file", TT.tknz=TRUE, encoding="UTF-8")
But then, when I run TEST = treetag("GNGResp.txt",treetagger = "kRp.env")
, I get a much larger error:
Finally, when running the same command with debug=TRUE
as suggested in the error above, this is what R returns:
Do you have any idea what I should do here?
Strangely, Treetagger works wonders when I run it using Treetagger's graphical interface for Windows
setting TT.cmd
to a batch file is rather untested. what the language prefixes trying to do is recreate the contents of those batch files and run them in the background. treetag()
expects a character vector in return that can be turned into a table with three columns, which is the usual format returned by TreeTagger. as long as we get this vector, we're good. you can directly import previously tagged texts with koRpus::readTagged()
, btw.
just to rule out bugs in the french language package, can you try treetaggig an english text?
also, does setting fail as well?
set.kRp.env(
TT.cmd="manual",
TT.options=list(
path="C:/TreeTagger",
preset="fr"),
lang="fr"
)
Ok, I clearly misunderstood the usage of the TT.cmd
argument.
When I ran Treetagger with the graphical interface, it indeed returned me a table with three columns in a .txt file.
Thanks for the tip regarding the way to import tagged texts using readTagged()
, unfortunately my code is part of function in a RMarkdown file, hence I cannot process it separately outside R environment.
To Rule out the possibility of a bug related to the french package, I used the same example as provided in the TreeTagger installation text file (i.e. running it on their INSTALL.txt file) and this again returned the same error as earlier:
library("koRpus.lang.en"); TEST = treetag("INSTALL.txt", treetagger = "manual", lang = "en", TT.options = list(path="C:/TreeTagger", preset="en"))
I am now wondering whether this is maybe more of a Windows & R -related error than anything related to your package. Is this the first time someone notices you regarding this error?
By the way in the help regarding treetag()
you mentioned:
# second way, use one of the batch scripts that come with TreeTagger:
tagged.results <- treetag(
file.path(path.package("koRpus"), "tests", "testthat", "sample_text.txt"),
treetagger="~/bin/treetagger/cmd/tree-tagger-english",
lang="en"
What is the bin/treetagger/cmd/tree-tagger-english
file?
There is nothing like this file in the Windows version of Treetagger
.
What would be the windows version of that file? Maybe this might work.
I have now tried running a simple script in RKward
which returned another error related to R v.4.0.2
:
You will find here the script and the text file attached to reproduce the error.
I also read this thread #7 and realized that there issues are close to mine even if the error returned was not the same.
regaring the RKWard errors: it seems the windows version of RKWard was bundled with R 3.6.2 while you had locally installed 4.0.2. the installed packages are not downward compatible with older R versions, therefore you get the error (it's an R thing, not so much related to RKWard; installed R packages are always compiled for a particular R version). you could either try change your rkward.ini
file to point to your R installation, or try the nightly build which hopefully uses a more up-to-date R version.
regarding the path error: this is definitely a windows thing. it is also a bit mysterious for me as we've just finished some studies using koRpus
and the students were all running windows. no-one had this issue. so it could be that either a bug was introduced shortly before i released 0.13-1 on CRAN (which was just last week), or there's something different about your setup that's causing this.
cmd/tree-tagger-english
is the unix version of the batch files. if you install TreeTagger on macOS or linux, the script files have different names and syntax. i haven't tested this in years, but you could try this with a batch file.
the main difficulty here is to find out where in the function code the error is triggered. when i have a little more time i can set up a virtual machine and try to replicate the problem. until then, it would be of great help if you could try to debug the function, i.e., run treetag()
in a debugging mode and try to find out which line of code leads to failure.
Thanks for the tips, I could indeed change the version of R RKward was pointing and the code now runs but the same error pops up:
Mhh this is really strange, since I also tested my code on two different Windows 10 computers (one locally from home and one that runs on the University network), while both return the same error...
I have never tried debugging in R, how can I open the source code of the treetag() function to place a debugger?
I downloaded the source code from github and opened the R code 02_method_treetag.R
and placed a debugger there but it's not working (I am used to debugging in MATLAB, hence I am a bit lost here).
the traceback()
function indicates again that error is occuring at the level of normalizePath()
:
Hi, I could dig further into debugging in R and by running:
traceback()
options(error = recover)
TEST = treetag("GNGResp.txt", treetagger = "manual", lang = "fr", TT.options = list(path="C:/TreeTagger", preset="fr"))
Selection: 6 (i.e. to enter the normalizePath function)
I got the following:
Hence, path
is actually empty when provided to normalizePath, this is where the problem begins.
Similarly, entering Selection: 4 (i.e. check_toggle_tf8)
, returns an empty path
and dir
variables.
thanks for your investigations! i'll look into this, hope i get to it next week.
thanks for your investigations! i'll look into this, hope i get to it next week.
You're welcome, let me know if you find something.
Many thanks and best regards
one other thing: given that many windows users have successfully used the package in the past, and 0.13-1 is just a few weeks old, there is a possibility that the issue was introduced just recently. to test this, you could simply try out older versions of the package, e.g.
devtools::install_github("unDocUMeantIt/koRpus", ref="0.11-5") # latest CRAN release before 0.13
devtools::install_github("unDocUMeantIt/koRpus", ref="0.12-1") # latest intermediate release before 0.13, wasn't on CRAN
don't run both of these in one session, as they will overwrite each other. you should also restart R after the installation was successful, to be sure you've loaded the right version when testing treetag()
.
there has been one alteration of windows specific code lately, which could be related to this. therefore, you could also try the latest develop snapshot before this commit:
devtools::install_github("unDocUMeantIt/koRpus", ref="96d24f486a4443b8b14a392b620ed43b4d4a507b")
Hi koRpus,
I was also receiving the same "Error in path.expand(path) : argument 'path' incorrect"
error and couldn't get it any of the usual fixes to work.
Once I installed koRpus version 0.12-1 via (devtools::install_github("unDocUMeantIt/koRpus", ref="0.12-1")
as you suggest above, I was able to use treetag() without receiving the error. Thus, it seems like there is something unintentionally odd happening with 0.13-1.
The package is great, and I really appreciate your contribution to helping NLP methods become more accessible to R users.
Cheers, Chris
hi chris,
Once I installed koRpus version 0.12-1 via
(devtools::install_github("unDocUMeantIt/koRpus", ref="0.12-1")
as you suggest above, I was able to use treetag() without receiving the error. Thus, it seems like there is something unintentionally odd happening with 0.13-1.
thank you, that is indeed very helpful to know! as you can see from the ChangeLog there have been quite a lot of fixes and fundamental changes between 0.12 and 0.13, but i suspect that this was intruduced only with the last few commits as we were using also the develop branch for studies with windows.
could you perhaps also try to install the 96d24f486a4443b8b14a392b620ed43b4d4a507b
version mentioned above? if that also still works for you, that would reduce the number of commits to review dramatically, as this is almost the final state of 0.13-1 with mostly cosmetic fixes missing, except for one windows related patch.
The package is great, and I really appreciate your contribution to helping NLP methods become more accessible to R users.
you're welcome ;)
hi chris,
Once I installed koRpus version 0.12-1 via
(devtools::install_github("unDocUMeantIt/koRpus", ref="0.12-1")
as you suggest above, I was able to use treetag() without receiving the error. Thus, it seems like there is something unintentionally odd happening with 0.13-1.thank you, that is indeed very helpful to know! as you can see from the ChangeLog there have been quite a lot of fixes and fundamental changes between 0.12 and 0.13, but i suspect that this was intruduced only with the last few commits as we were using also the develop branch for studies with windows.
could you perhaps also try to install the
96d24f486a4443b8b14a392b620ed43b4d4a507b
version mentioned above? if that also still works for you, that would reduce the number of commits to review dramatically, as this is almost the final state of 0.13-1 with mostly cosmetic fixes missing, except for one windows related patch.The package is great, and I really appreciate your contribution to helping NLP methods become more accessible to R users.
you're welcome ;)
Dear koRpus,
unexpectedly the 96d24f486a4443b8b14a392b620ed43b4d4a507b
version works wonders, many thanks !
The version 0.12-1
indeed doesn't return the "Error in path.expand(path) : argument 'path' incorrect"
message but generated the following error:
unexpectedly the
96d24f486a4443b8b14a392b620ed43b4d4a507b
version works wonders, many thanks !
aha, so we have a winner. the patch which seems to cause the problems in 0.13-1 was added to work around issues related to perl regular expressions which were reported to be malformed on windows. i'll try to come up with a more bullet proof solution for that and do a bugfix release once we're sure all problems are gone.
in the meatime, windows users who came here: try the 96d24f486a4443b8b14a392b620ed43b4d4a507b
version as a workaround ;)
i just added a patch that hopefully resolves the issue. i debugged treetag()
in a virtual machine running windows and as it turned out, the problem was a file name check by an internal function which fails if actually there is no file name to check at all.
could you please try to installed the latest develop version and see if it solves the issue for you?
devtools::install_github("unDocUMeantIt/koRpus", ref="develop")
devtools::install_github("unDocUMeantIt/koRpus", ref="develop")
Dear koRpus,
well done, I hereby confirm that you fixed the error, the develop version works wonders.
Many thanks for your help!
Best,
Corentin
thank you for reporting and helping me to debug this. the fixed release koRpus
0.13-3 is already accepted on CRAN and should be available soon. in case you don't want to wait, you can install it instantly from github:
devtools::install_github("unDocUMeantIt/koRpus", ref="0.13-3")
Dear koRpus team,
I have been trying to run the treetag function in R for over 2 days and cannot go past the following error:
Error in path.expand(path) : argument 'path' incorrect
I used the following code:
library("koRpus");library("koRpus.lang.fr"); TEST = treetag("GNGResp.txt", treetagger = "manual", lang = "fr", TT.options = list(path="C:/TreeTagger", preset="fr"))
I installed Treetagger strictly following their INSTALL.txt explanations and TreeTagger is working when I call it from windows CMD.
Also this installation was done as recommended in the root directory of C:/ drive:
Best wishes,
Corentin