Closed joecheriross closed 8 years ago
Hi Joe,
Not sure exactly what mode of the system you're running. If you're running coref only, then the command-line argument you want is -corefDocSuffix "gold_conll"
The reason it defaults to auto_conll is that this is more standard for coref evaluation, but it'll work fine with gold as well.
Greg
Thanks Greg.
I think option 'corefDocSuffix' is not there. I tried with 'docSuffix'. That is not helping. So I got all the auto_conll files corresponding to the gold_conll files and tried. The training is happening. But it is not picking the test file from the test path. It is raising some iterator exception since it cannot find any test file. What is the filename format expected for a test conll file. I tried changing the filename ending with both 'auto_conll' and 'gold_conll'
Sorry to bother you.
Hi Joe,
-corefDocSuffix was added in a newer commit, but it used to double with -docSuffix so that was the right thing to try. The suffix should be the same for train and test, so I'm not sure what the problem is. Can you send me the exact command you're running?
Greg
On Mon, Dec 7, 2015 at 12:07 AM, Joe Cheri Ross notifications@github.com wrote:
Thanks Greg.
I think option 'corefDocSuffix' is not there. I tried with 'docSuffix'. That is not helping. So I got all the auto_conll files corresponding to the gold_conll files and tried. The training is happening. But it is not picking the test file from the test path. It is raising some iterator exception since it cannot find any test file. What is the filename format expected for a test conll file. I tried changing the filename ending with both 'auto_conll' and 'gold_conll'
Sorry to bother you.
— Reply to this email directly or view it on GitHub https://github.com/gregdurrett/berkeley-entity/issues/4#issuecomment-162443731 .
Hi Greg,
~/java-9-oracle/bin/java -Xmx8g -jar berkeley-entity-1.0.jar ++config/base.conf -execDir scratch -mode COREF_TRAIN_PREDICT -testPath /tmp/test/ -trainPath ./train/ -modelPath "models/joint-onto.ser.gz" -wikipediaPath "models/wiki-db-onto.ser.gz" -useGoldMentions -pruningStrategy build:models/cached/corefpruner-onto.ser.gz:-5:5 -nerPruningStrategy build:models/cached/nerpruner-onto.ser.gz:-9:5 -outputPath /tmp/test_output/
After adding auto_conll files to the trainPath along with gold_conll files, training is happening. But it is not taking test files.
"Loading -1 docs from /tmp/test_output/ ending with ERROR: java.util.NoSuchElementException: next on empty iterator: scala.collection.Iterator$$anon$2.next(Iterator.scala:39) scala.collection.Iterator$$anon$2.next(Iterator.scala:37) scala.collection.IndexedSeqLike$Elements.next(IndexedSeq "
Thanks, Joe
Hi Greg,
I could solve this. Thanks for your directions. I thought for train prediction, suffix need not be given. But it has to be given. A small suggestion; it is good to raise an exception when no suffix is provided as command line argument. It is not happening in COREF_TRAIN_PREDICT mode.
Also I am trying to do some pruning in mention pair formation in the testing phase. Can you please point to the file and function which I will have to edit for this.
Thanks, Joe
Hi Joe,
Glad you were able to solve it!
CorefPruner controls pruning over mention pairs. In the runTrain method in CorefSystem.scala, we call:
CorefPruner.buildPruner(Driver.pruningStrategy).pruneAll(trainDocGraphs);
and an analogous call for test time in prepareTestDocuments. I would suggest subclassing CorefPruner appropriately and then building it from the passed in string argument. Right now we run the same pruning for train and test time, but you could fix this by adding a boolean flag to pruneAll indicating which phase it is.
Greg
On Mon, Dec 7, 2015 at 5:35 AM, Joe Cheri Ross notifications@github.com wrote:
Hi Greg,
I could solve this. Thanks for your directions. I thought for train prediction, suffix need not be given. But it has to be given. A small suggestion; it is good to raise an exception when no suffix is provided as command line argument. It is not happening in COREF_TRAIN_PREDICT mode.
Also I am trying to do some pruning in mention pair formation in the testing phase. Can you please point to the file and function which I will have to edit for this.
Thanks, Joe
— Reply to this email directly or view it on GitHub https://github.com/gregdurrett/berkeley-entity/issues/4#issuecomment-162526101 .
Thank you Greg. I will try that.
Thanks, Joe
On Tue, Dec 8, 2015 at 6:50 AM, Greg Durrett notifications@github.com wrote:
Hi Joe,
Glad you were able to solve it!
CorefPruner controls pruning over mention pairs. In the runTrain method in CorefSystem.scala, we call:
CorefPruner.buildPruner(Driver.pruningStrategy).pruneAll(trainDocGraphs);
and an analogous call for test time in prepareTestDocuments. I would suggest subclassing CorefPruner appropriately and then building it from the passed in string argument. Right now we run the same pruning for train and test time, but you could fix this by adding a boolean flag to pruneAll indicating which phase it is.
Greg
On Mon, Dec 7, 2015 at 5:35 AM, Joe Cheri Ross notifications@github.com wrote:
Hi Greg,
I could solve this. Thanks for your directions. I thought for train prediction, suffix need not be given. But it has to be given. A small suggestion; it is good to raise an exception when no suffix is provided as command line argument. It is not happening in COREF_TRAIN_PREDICT mode.
Also I am trying to do some pruning in mention pair formation in the testing phase. Can you please point to the file and function which I will have to edit for this.
Thanks, Joe
— Reply to this email directly or view it on GitHub < https://github.com/gregdurrett/berkeley-entity/issues/4#issuecomment-162526101
.
— Reply to this email directly or view it on GitHub https://github.com/gregdurrett/berkeley-entity/issues/4#issuecomment-162724848 .
Hi,
I was trying to do training with ontonotes train data(*gold_conll) I have. When I give the train path to this data, the system is asking for auto_conll files as well. Can the training be done only with gold_conll files ? Correct me if there is some problem with my understanding.
Thanks, Joe
"Loading -1 docs from /home/joe/music_ontology/MusOntoLearning/ground_truth/ontonotes/train/ ending with auto_conll"