unDocUMeantIt / koRpus

An R Package for Text Analysis
GNU General Public License v3.0
45 stars 6 forks source link

Error in path.expand(path) : argument 'path' incorrect #25

Closed CorentinWicht closed 4 years ago

CorentinWicht commented 4 years ago

Dear koRpus team,

I have been trying to run the treetag function in R for over 2 days and cannot go past the following error: Error in path.expand(path) : argument 'path' incorrect

I used the following code:

library("koRpus");library("koRpus.lang.fr"); TEST = treetag("GNGResp.txt", treetagger = "manual", lang = "fr", TT.options = list(path="C:/TreeTagger", preset="fr"))

I installed Treetagger strictly following their INSTALL.txt explanations and TreeTagger is working when I call it from windows CMD.

Also this installation was done as recommended in the root directory of C:/ drive: image

Best wishes,

Corentin

- Microsoft Windows 10 Family v. 10.0.18363 
-  R version 4.0.2 
- koRpus version 0.13.1 
unDocUMeantIt commented 4 years ago

that's odd, there is no call to path.expand() in the koRpus package. can you try with tokenize() instead of treetag()?

CorentinWicht commented 4 years ago

that's odd, there is no call to path.expand() in the koRpus package. can you try with tokenize() instead of treetag()?

Many thanks for your reply, the following command indeed works: TESTToken = tokenize("GNGResp.txt", lang="fr", detect=c(parag=TRUE, hline=TRUE))

The problem is that I am looking for a tool to only extract nouns from a open-ended question in psychology research. Are there any alternatives to TreeTagger?

Best,

Corentin

CorentinWicht commented 4 years ago

I think there path.expand occured because I provided the path in the form C:/TreeTagger while it seems to be looking a relative path like ~/TreeTagger

The problem is that my working directory is in the D:\ drive and Treetagger is installed in C:

unDocUMeantIt commented 4 years ago

i have no idea where path.expand() comes into play here, because like i said i can't find that call anywhere in the package's source code. i don't think any ~ is expanded here if you don't set it.

if you are using RStudio you could try running the commands in a bare R session to check whether the GUI does something unexpected (i am using RKWard).

another approach would be to replace the string "C:/TreeTagger" with file.path("C:", "TreeTagger") and see if that has any effect. does dir.exists("C:/TreeTagger") return TRUE?

CorentinWicht commented 4 years ago

Many thanks for your prompt reply.

I may have found a lead, when knitting the Rmarkdown script I am using I got a more detailed explanations of the error: image

I read something about a same issue in another package that was fixed here. This is the function normalize_path that should be used to replace normalizePath. This seems to be a Windows-specific error...

Based on your reply, I tried with file.path("C:", "TreeTagger") but the same error appeared while dir.exists("C:/TreeTagger") indeed returned TRUE.

I also tried changing the path of the R_USER environment global variable to C:\ and then setting path="~/TreeTagger" in the function treetag, but again the same error popped-up.

If the solution I provided you doesn't work, I will give it a try with RKWard, thanks.

unDocUMeantIt commented 4 years ago

good catch. if normalizePath() is the culprit, that's good to know.

i'm always hesitant to adding new dependencies to a package, but looking more closely at xfun::normalize_path(), its main difference from normalizePath() seems to be the default setting for winslash: it's "/" instead of "\\". this could also be set in all instances of normalizePath() in koRpus' functions, as the option is only used on windows machines.

to make sure we're on the right track, does normalizePath("C:/TreeTagger") give you the same error, and can it be avoided by normalizePath("C:/TreeTagger", winslash="/")? or is it even possible to work around this by trying "C:\\TreeTagger"?

CorentinWicht commented 4 years ago

Well I am now unsure it is the only culprit since,

  1. normalizePath("C:/TreeTagger") doesn't return any error, it does return "[1] C:\\TreeTagger"

  2. all these variations return the same error as ealier:

    path = normalizePath("C:/TreeTagger", winslash="/")
    path = normalizePath("C:/TreeTagger", winslash="\\")
    path="C:\\TreeTagger"
    path="C://TreeTagger"

I tried another trick by defining the settings using the set.krp.env function as following (this works!): set.kRp.env(TT.cmd="C:\\TreeTagger\\bin\\tag-french.bat", lang="fr", preset="fr", treetagger="manual", format="file", TT.tknz=TRUE, encoding="UTF-8")

But then, when I run TEST = treetag("GNGResp.txt",treetagger = "kRp.env"), I get a much larger error: image

Finally, when running the same command with debug=TRUE as suggested in the error above, this is what R returns: image

Do you have any idea what I should do here?

Strangely, Treetagger works wonders when I run it using Treetagger's graphical interface for Windows

unDocUMeantIt commented 4 years ago

setting TT.cmd to a batch file is rather untested. what the language prefixes trying to do is recreate the contents of those batch files and run them in the background. treetag() expects a character vector in return that can be turned into a table with three columns, which is the usual format returned by TreeTagger. as long as we get this vector, we're good. you can directly import previously tagged texts with koRpus::readTagged(), btw.

just to rule out bugs in the french language package, can you try treetaggig an english text?

also, does setting fail as well?

set.kRp.env(
  TT.cmd="manual",
  TT.options=list(
    path="C:/TreeTagger",
    preset="fr"),
  lang="fr"
)
CorentinWicht commented 4 years ago

Ok, I clearly misunderstood the usage of the TT.cmd argument.

When I ran Treetagger with the graphical interface, it indeed returned me a table with three columns in a .txt file.

Thanks for the tip regarding the way to import tagged texts using readTagged(), unfortunately my code is part of function in a RMarkdown file, hence I cannot process it separately outside R environment.

To Rule out the possibility of a bug related to the french package, I used the same example as provided in the TreeTagger installation text file (i.e. running it on their INSTALL.txt file) and this again returned the same error as earlier:

 library("koRpus.lang.en"); TEST = treetag("INSTALL.txt", treetagger = "manual", lang = "en", TT.options = list(path="C:/TreeTagger", preset="en"))

I am now wondering whether this is maybe more of a Windows & R -related error than anything related to your package. Is this the first time someone notices you regarding this error?

By the way in the help regarding treetag() you mentioned:

# second way, use one of the batch scripts that come with TreeTagger:
tagged.results <- treetag(
  file.path(path.package("koRpus"), "tests", "testthat", "sample_text.txt"),
  treetagger="~/bin/treetagger/cmd/tree-tagger-english",
  lang="en"

What is the bin/treetagger/cmd/tree-tagger-english file? There is nothing like this file in the Windows version of Treetagger. What would be the windows version of that file? Maybe this might work.

CorentinWicht commented 4 years ago

I have now tried running a simple script in RKward which returned another error related to R v.4.0.2: image

You will find here the script and the text file attached to reproduce the error.

I also read this thread #7 and realized that there issues are close to mine even if the error returned was not the same.

unDocUMeantIt commented 4 years ago

regaring the RKWard errors: it seems the windows version of RKWard was bundled with R 3.6.2 while you had locally installed 4.0.2. the installed packages are not downward compatible with older R versions, therefore you get the error (it's an R thing, not so much related to RKWard; installed R packages are always compiled for a particular R version). you could either try change your rkward.ini file to point to your R installation, or try the nightly build which hopefully uses a more up-to-date R version.

regarding the path error: this is definitely a windows thing. it is also a bit mysterious for me as we've just finished some studies using koRpus and the students were all running windows. no-one had this issue. so it could be that either a bug was introduced shortly before i released 0.13-1 on CRAN (which was just last week), or there's something different about your setup that's causing this.

cmd/tree-tagger-english is the unix version of the batch files. if you install TreeTagger on macOS or linux, the script files have different names and syntax. i haven't tested this in years, but you could try this with a batch file.

the main difficulty here is to find out where in the function code the error is triggered. when i have a little more time i can set up a virtual machine and try to replicate the problem. until then, it would be of great help if you could try to debug the function, i.e., run treetag() in a debugging mode and try to find out which line of code leads to failure.

CorentinWicht commented 4 years ago

Thanks for the tips, I could indeed change the version of R RKward was pointing and the code now runs but the same error pops up: image

Mhh this is really strange, since I also tested my code on two different Windows 10 computers (one locally from home and one that runs on the University network), while both return the same error...

I have never tried debugging in R, how can I open the source code of the treetag() function to place a debugger? I downloaded the source code from github and opened the R code 02_method_treetag.R and placed a debugger there but it's not working (I am used to debugging in MATLAB, hence I am a bit lost here).

the traceback() function indicates again that error is occuring at the level of normalizePath(): image

CorentinWicht commented 4 years ago

Hi, I could dig further into debugging in R and by running:

traceback() options(error = recover) TEST = treetag("GNGResp.txt", treetagger = "manual", lang = "fr", TT.options = list(path="C:/TreeTagger", preset="fr")) Selection: 6 (i.e. to enter the normalizePath function)

I got the following: image

Hence, path is actually empty when provided to normalizePath, this is where the problem begins.

CorentinWicht commented 4 years ago

Similarly, entering Selection: 4 (i.e. check_toggle_tf8), returns an empty path and dir variables.

image

unDocUMeantIt commented 4 years ago

thanks for your investigations! i'll look into this, hope i get to it next week.

CorentinWicht commented 4 years ago

thanks for your investigations! i'll look into this, hope i get to it next week.

You're welcome, let me know if you find something.

Many thanks and best regards

unDocUMeantIt commented 4 years ago

one other thing: given that many windows users have successfully used the package in the past, and 0.13-1 is just a few weeks old, there is a possibility that the issue was introduced just recently. to test this, you could simply try out older versions of the package, e.g.

devtools::install_github("unDocUMeantIt/koRpus", ref="0.11-5") # latest CRAN release before 0.13
devtools::install_github("unDocUMeantIt/koRpus", ref="0.12-1") # latest intermediate release before 0.13, wasn't on CRAN

don't run both of these in one session, as they will overwrite each other. you should also restart R after the installation was successful, to be sure you've loaded the right version when testing treetag().

there has been one alteration of windows specific code lately, which could be related to this. therefore, you could also try the latest develop snapshot before this commit:

devtools::install_github("unDocUMeantIt/koRpus", ref="96d24f486a4443b8b14a392b620ed43b4d4a507b")
runyoncr commented 4 years ago

Hi koRpus,

I was also receiving the same "Error in path.expand(path) : argument 'path' incorrect" error and couldn't get it any of the usual fixes to work.

Once I installed koRpus version 0.12-1 via (devtools::install_github("unDocUMeantIt/koRpus", ref="0.12-1") as you suggest above, I was able to use treetag() without receiving the error. Thus, it seems like there is something unintentionally odd happening with 0.13-1.

The package is great, and I really appreciate your contribution to helping NLP methods become more accessible to R users.

Cheers, Chris

unDocUMeantIt commented 4 years ago

hi chris,

Once I installed koRpus version 0.12-1 via (devtools::install_github("unDocUMeantIt/koRpus", ref="0.12-1") as you suggest above, I was able to use treetag() without receiving the error. Thus, it seems like there is something unintentionally odd happening with 0.13-1.

thank you, that is indeed very helpful to know! as you can see from the ChangeLog there have been quite a lot of fixes and fundamental changes between 0.12 and 0.13, but i suspect that this was intruduced only with the last few commits as we were using also the develop branch for studies with windows.

could you perhaps also try to install the 96d24f486a4443b8b14a392b620ed43b4d4a507b version mentioned above? if that also still works for you, that would reduce the number of commits to review dramatically, as this is almost the final state of 0.13-1 with mostly cosmetic fixes missing, except for one windows related patch.

The package is great, and I really appreciate your contribution to helping NLP methods become more accessible to R users.

you're welcome ;)

CorentinWicht commented 4 years ago

hi chris,

Once I installed koRpus version 0.12-1 via (devtools::install_github("unDocUMeantIt/koRpus", ref="0.12-1") as you suggest above, I was able to use treetag() without receiving the error. Thus, it seems like there is something unintentionally odd happening with 0.13-1.

thank you, that is indeed very helpful to know! as you can see from the ChangeLog there have been quite a lot of fixes and fundamental changes between 0.12 and 0.13, but i suspect that this was intruduced only with the last few commits as we were using also the develop branch for studies with windows.

could you perhaps also try to install the 96d24f486a4443b8b14a392b620ed43b4d4a507b version mentioned above? if that also still works for you, that would reduce the number of commits to review dramatically, as this is almost the final state of 0.13-1 with mostly cosmetic fixes missing, except for one windows related patch.

The package is great, and I really appreciate your contribution to helping NLP methods become more accessible to R users.

you're welcome ;)

Dear koRpus,

unexpectedly the 96d24f486a4443b8b14a392b620ed43b4d4a507b version works wonders, many thanks !

The version 0.12-1 indeed doesn't return the "Error in path.expand(path) : argument 'path' incorrect" message but generated the following error:

image

unDocUMeantIt commented 4 years ago

unexpectedly the 96d24f486a4443b8b14a392b620ed43b4d4a507b version works wonders, many thanks !

aha, so we have a winner. the patch which seems to cause the problems in 0.13-1 was added to work around issues related to perl regular expressions which were reported to be malformed on windows. i'll try to come up with a more bullet proof solution for that and do a bugfix release once we're sure all problems are gone.

in the meatime, windows users who came here: try the 96d24f486a4443b8b14a392b620ed43b4d4a507b version as a workaround ;)

unDocUMeantIt commented 4 years ago

i just added a patch that hopefully resolves the issue. i debugged treetag() in a virtual machine running windows and as it turned out, the problem was a file name check by an internal function which fails if actually there is no file name to check at all.

could you please try to installed the latest develop version and see if it solves the issue for you?

devtools::install_github("unDocUMeantIt/koRpus", ref="develop")
CorentinWicht commented 4 years ago
devtools::install_github("unDocUMeantIt/koRpus", ref="develop")

Dear koRpus,

well done, I hereby confirm that you fixed the error, the develop version works wonders.

Many thanks for your help!

Best,

Corentin

unDocUMeantIt commented 4 years ago

thank you for reporting and helping me to debug this. the fixed release koRpus 0.13-3 is already accepted on CRAN and should be available soon. in case you don't want to wait, you can install it instantly from github:

devtools::install_github("unDocUMeantIt/koRpus", ref="0.13-3")