Open johnvblazic opened 7 years ago
I've checked the pipegraph logs, the pipegraph code, the EasySRL code, and the .conf files for the neuralccg project, but I can't find any reference to the file path that it is failing on other than /data/ccgbank_1_1 in the .conf file
The files should be set up such that the famous Pierre Vinken example (first sentence of the dev set) can be found via this path: neuralccg/data/ccgbank_1_1/data/AUTO/00/wsj_0001.auto
Does this match something you've tried?
yeah, that was where i've started and i've been trying permutations since. the demo works just fine, i'm currently trying to get the training module running with the following command,
./run.sh experiments/train.conf train 8080
Is there any way I can find the file path it is failing on?
It looks like the failure is happening here: https://github.com/kentonl/EasySRL/blob/maven/src/edu/uw/easysrl/corpora/CCGBankDependencies.java#L386
There is likely a mismatch between the content of the online version and the disc version of CCGBank. You can debug and/or apply temporary fixes by cloning the maven
branch of https://github.com/kentonl/EasySRL. After running mvn install
with local edits, neuralccg should use the updated code.
Hi,
Our university has the disc copies of the CCG bank and I don't have access to the online versions of the data. I pulled the data from the call signature that appears in the link, and the data that I've gathered appears to be the same format as the sample provided in the link. I can't tell from the code or the readme what the directory structure of "ccgbank_1_1" is. So far, I've tried putting the "data" directory that I found in the ccgbank downlown in that directory, I have also tried putting the AUTO/HTML/LEX/PARG/RAW directories in the ccgbank_1_1 directory as well.
Any guidance you could provide would be extremely helpful.
I'm consistently getting the following error:
12:54:10 | ERROR | c.g.k.p.core.Stage | Job failed. java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.String.substring(String.java:1967) ~[na:1.8.0_121] at edu.uw.easysrl.corpora.CCGBankDependencies.getDependencyParseCCGBank(CCGBankDependencies.java:386) ~[EasySRL-d69cb6e7d99595372df8dda65b7e975b21f18c37.jar:na] at edu.uw.easysrl.corpora.CCGBankDependencies.getDependencyParses(CCGBankDependencies.java:364) ~[EasySRL-d69cb6e7d99595372df8dda65b7e975b21f18c37.jar:na] at edu.uw.easysrl.corpora.CCGBankDependencies.loadCorpus(CCGBankDependencies.java:349) ~[EasySRL-d69cb6e7d99595372df8dda65b7e975b21f18c37.jar:na] at edu.uw.neuralccg.task.CCGBankReaderTask.parseStream(CCGBankReaderTask.java:19) ~[classes/:na] at edu.uw.neuralccg.task.CCGBankReaderTask.run(CCGBankReaderTask.java:34) ~[classes/:na] at com.github.kentonl.pipegraph.core.Stage.run(Stage.java:195) ~[pipegraph-bb781b4c3496e98c337a030d98b81f31490ab0f4.jar:na] at com.github.kentonl.pipegraph.runner.AsynchronousPipegraphRunner.run(AsynchronousPipegraphRunner.java:43) [pipegraph-bb781b4c3496e98c337a030d98b81f31490ab0f4.jar:na] at com.github.kentonl.pipegraph.runner.AsynchronousPipegraphRunner.lambda$null$1(AsynchronousPipegraphRunner.java:61) [pipegraph-bb781b4c3496e98c337a030d98b81f31490ab0f4.jar:na] at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_121]