nlplab / nersuite

http://nersuite.nlplab.org/
Other
26 stars 12 forks source link

nersuite learn creates broken model? #6

Closed spyysalo closed 12 years ago

spyysalo commented 12 years ago

Haven't had time to trace this down in detail, but on the data now available on

http://weaver.nlplab.org/~smp/data/AnEM.devel.withmm.train.gz

if I run

nersuite learn -C 0.03125 -m model < AnEM.devel.withmm.train

and then

nersuite tag -m model < AnEM.test

(for any test data, it seems) I get a segmentation fault.

Will try to look into this in more detail.

spyysalo commented 12 years ago

Tried testing with a slightly different regularization parameter value (-C 0.03) and everything works fine, so this likely isn't a problem with the data.

priancho commented 12 years ago

I downloaded the file (http://weaver.nlplab.org/~smp/data/AnEM.devel.withmm.train.gz) and trained a model with the same command (nersuite learn -C 0.03125 -m model < AnEM.devel.withmm.train) Because I didn't have AnEM.test, I ran the nersuite in tagging mode with the training data by removing the first column gold annotation. And I got the result file without the segmentation fault.

Could you run some sanity check (e.g. offsets do not decrease, # of columns for every line is same)? And if it is ok, please upload the test file you used. I will check it with the nersuite installed in my account.

spyysalo commented 12 years ago

@priancho: thanks for the rapid response, and sorry this appears to be difficult to replicate. I've tested further, extracting the CoNLL-formatted data for the first two sentences in my test data, now available at http://weaver.nlplab.org/~smp/data/minimal.test and http://weaver.nlplab.org/~smp/data/minimal2.test. I also retrained the model to make sure.

NERsuite still crashes for me for both of the minimal test datasets:

$ nersuite tag -m model < minimal.test 
Segmentation fault
$ nersuite tag -m model < minimal2.test 
Segmentation fault

So I'd guess that this issue does not relate to the test data set.

I've also temporarily placed the trained model at http://weaver.nlplab.org/~smp/data/model; I hope this helps track down the issue. My architecture is as follows:

$ cat /proc/cpuinfo 
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 15
model name  : Intel(R) Core(TM)2 Duo CPU     U7700  @ 1.33GHz
stepping    : 13
cpu MHz     : 800.000
cache size  : 2048 KB
physical id : 0
siblings    : 2
core id     : 0
cpu cores   : 2
fdiv_bug    : no
hlt_bug     : no
f00f_bug    : no
coma_bug    : no
fpu     : yes
fpu_exception   : yes
cpuid level : 10
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm ida
bogomips    : 2660.07
clflush size    : 64

[2nd identical processor omitted.]

Finally, I'd like to note that this appears to be an isolated issue: I've trained and tested any number of models on this data, and have only had problems with this specific combination of feature set and C parameter; [EDIT: scratch the previous, I'm no longer able to train a model that wouldn't crash on this specific data. No idea why this seemed to work previously.]

Please let me know if there's any further info I can provide to help get to the bottom of this!

spyysalo commented 12 years ago

Apologies, I should have run the above tests with

cut -f 2- DATA | nersuite tag -m model

instead of

nersuite tag -m mode < DATA

re-running the tests now.

priancho commented 12 years ago

Ah, sorry that the nersuite does not provide any format checking in run time ;-)

spyysalo commented 12 years ago

I can no longer reproduce this.

I've re-re-checked my log and scripts, and I'm confident that I ran NERsuite as described in the original issue when first seeing the problem (this was part of a battery of scripted experiments, and this was the only one to fail), but running the same scripts with the same data no longer reproduces this (i.e. it works now), and I've unfortunately overwritten the model that exhibited the problem.

(Is there a possibility that different models would be learned from the same data under different circumstances? My machine was heavily loaded yesterday (I was running MetaMap in parallel) and has no load today. Other than that, I'm largely out of guesses.)

Anyway, please feel free to close this as invalid/WORKSFORME. I'll make a new issue if something like this ever resurfaces.