Closed MTKnife closed 4 years ago
Running the script again, I've confirmed this is different from issue https://github.com/shangjingbo1226/AutoPhrase/issues/64: in my case, I didn't get an error about an incorrect English POS parameter file. However, I did get this error, 5 times in a row: cmd/tree-tagger-english: 20: cmd/tree-tagger-english: ./bin/tree-tagger: Permission denied
. One obvious potential issue is that that one particular file, unlike all the others, is owned by root. I used chown
and chgrp
to change its ownership, but that didn't fix the problem. Also, I'm not sure why this one file has a different owner in the first place: maybe because it's created by the run script, which is, by default, run as root?
Next I tried setting permissions to 775 (755 should work just as well), thinking maybe the problem was the file wasn't executable. That got me past that error (in fact, after I restored ownership to root later, it still worked fine). However, then I got this:
===Saving Model and Results===
cp: cannot stat 'tmp/segmentation.model': No such file or directory
===Generating Output===
java.io.FileNotFoundException: tmp/final_quality_multi-words.txt (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at java.io.FileInputStream.<init>(FileInputStream.java:93)
at Tokenizer.tokenizeText(Tokenizer.java:713)
at Tokenizer.main(Tokenizer.java:864)
java.io.FileNotFoundException: tmp/final_quality_unigrams.txt (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at java.io.FileInputStream.<init>(FileInputStream.java:93)
at Tokenizer.tokenizeText(Tokenizer.java:713)
at Tokenizer.main(Tokenizer.java:864)
java.io.FileNotFoundException: tmp/final_quality_salient.txt (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at java.io.FileInputStream.<init>(FileInputStream.java:93)
at Tokenizer.tokenizeText(Tokenizer.java:713)
at Tokenizer.main(Tokenizer.java:864)
This appears to be the closed but not exactly resolved issue https://github.com/shangjingbo1226/AutoPhrase/issues/34. But then I noticed this in the output in the section preceding the one I quoted above:
=======
Loading data...
# of total tokens = 299743372
max word token id = 1310326
# of documents = 3696848
# of distinct POS tags = 57
Mining frequent phrases...
selected MAGIC = 1310327
[Warning] failed to open data/BAD_POS_TAGS.txt under parameters = r
./auto_phrase.sh: line 108: 384 Segmentation fault ./bin/segphrase_train --pos_tag --thread $THREAD --pos_prune data/BAD_POS_TAGS.txt --label_method $LABEL_METHOD --label $LABEL_FILE --max_positives $MAX_POSITIVES --min_sup $MIN_SUP
real 3m58.528s
user 3m51.850s
sys 0m3.760s
Evidently, the later error was caused by the absence of a file that was supposed to be output by bin/segphrase_train
. Weirdly, the warning and subsequent error disappeared next time I ran the script...a transient error of some sort? Or somehow caused by changing the ownership of tools/treetagger/bin/tree-tagger
away from root? The latter seems unlikely--I think it was just a coincidence.
In that last run everything seemed to run just fine, but, despite a prominent green message alerting me to the fact that it was dumping the results, nothing was saved in results
. After some poking around, I discovered they were instead saved in the value of the MODELS
variable, which is, by default, models/DBLP
. I suppose that's fine, but then why have a results
directory in the first place? The obvious answers is that results
is mapped from the host machine's file structure, making the results available outside the Docker container...which they aren't, because they're being saved in a directory internal to the container.
In conclusion, we have the following issues:
tools/treetagger/bin/tree-tagger
need to be changed.results
directory (or copied there), or that directory shouldn't be mounted by the docker run
command.data/BAD_POS_TAGS.txt
.I'm going to have a look and see if I can put in a pull request. Number 3 is trivial to fix; I'm not sure how to do number 1, but it can't be that hard. Of course, someone who have to be reading to approve a pull request...no one seems to have been monitoring the project since September.
Found it....compile.sh
, which gets called by the Dockerfile in a Docker deployment (or manually in a plain Linux deployment), copies one of several OS-specific tree-tagger files to tools/treetagger/bin/tree-tagger
, which explains why the file in question is owned by root. Why that causes a problem in Docker but not a normal Linux deployment, I don't know, but it's easy enough to fix in the Dockerfile by adding a chmod
right after the call to compile.sh
.
I've just made changes that address all these issues, and also provide tips for running a Docker in Windows. I'll push the branch to my fork (https://github.com/MTKnife/AutoPhrase), then create a PR after I've tested it.
OK, after much frustrating futzing around, I've got a version that works. I issued a PR, but I'm not sure how long it'll take someone to approve it. In the meantime, you can get the new version at https://github.com/MTKnife/AutoPhrase.
I'm running into the same issue while outside docker.
We will investigate the problem brought by POS tagging. One way to work around as a quick solution is to set ENABLE_POS_TAGGING as 0 in the script.
Any update on this isse?
Probably was because the download links were outdated.
Running AutoPhrase in a Docker container, I'm having an issue similar to the one described in closed issue https://github.com/shangjingbo1226/AutoPhrase/issues/46.
I've checked
tools/treetagger/download_parameter_files.sh
in the Docker download zip file, and confirmed that it's now got the right address for downloading the English POS tagger, but I'm still getting the issue. Specifically, the tail end of my output looks like this:Aside from the POS problem, terminal output seems normal (I don't recall seeing the
ERROR: not a parameter file: ./lib/english-utf8.par!
mentioned by the earlier reporter, but all the POS warning exceed my scroll buffer, so I can't doublecheck). However, after the completion of execution, theresults
folder is empty.A few notes on specifics:
I'm running the latest version of Docker (2.1.0.4) in Windows 10. The provided
make.sh
script did not work correctly under Windows in a MinGW64 shell (it chokes on thewget
for some reason, but anyway the filename is wrong--master.zip
in the URL is correct, but the file is actually downloaded asAutophrase-master.zip
, though using that name in the URL produces a 404) , but fixing that was trivial. However, I wasn't able to build the Docker container with the included Dockerfile. After several hours of trying to modify it to install OpenJava (I need to use this in production, so Oracle Java isn't an option, and anyway, that wouldn't install, either), I finally gave up and moved from thedebian:jessie
parent container to theopenjdk:8
container provided by the OpenJava people, which uses Debian Stretch. My Dockerfile thus skips the Java install, but uses everything else in the original, including the g++ installation. I don't know if OpenJava is causing the problem--but it looks like other people are using it as well--or perhaps the transition from Jessie to Stretch could be causing a problem as well, but I'm at a loss.Incidentally, when working with MinGW, the run command given in the README won't supply the proper paths for directory mounts. You need something like this instead, or you get a ";C" appended to the end of the working directory path: