zseder / hunvec

Sequential Tagging in NLP using neural networks
5 stars 4 forks source link

evaluation of short sentences #96

Closed pajkossy closed 8 years ago

pajkossy commented 8 years ago

currently while preparing datasets very short sentences are dropped. even if training is not possible with them the test data could contain them so that test results are reliable

pajkossy commented 8 years ago

I changed line 138 of word_tagger_dataset.py (if len(word) < 3; continue to if len(word)< 1, continue), when tring to train in the resulted dataset I got the error below (it is possible to train with the < 2 constraint)

Traceback (most recent call last): File "hunvec/seqtag/trainer.py", line 123, in main() File "hunvec/seqtag/trainer.py", line 119, in main wt.train() File "/home/pajkossy/Proj/hunvec/hunvec/seqtag/sequence_tagger.py", line 242, in train self.algorithm.train(dataset=self.dataset['train']) File "/home/pajkossy/pylearn2/pylearn2/training_algorithms/sgd.py", line 455, in train self.sgd_update(*batch) File "/home/pajkossy/hunvec_env/local/lib/python2.7/site-packages/theano/compile/function_module.py", line 606, in call storage_map=self.fn.storage_map) File "/home/pajkossy/hunvec_env/local/lib/python2.7/site-packages/theano/compile/function_module.py", line 595, in call outputs = self.fn() File "/home/pajkossy/hunvec_env/local/lib/python2.7/site-packages/theano/scan_module/scan_op.py", line 672, in rval r = p(n, [x[0] for x in i], o) File "/home/pajkossy/hunvec_env/local/lib/python2.7/site-packages/theano/scan_module/scan_op.py", line 661, in self, node) File "scan_perform.pyx", line 207, in theano.scan_module.scan_perform.perform (/home/pajkossy/.theano/compiledir_Linux-3.16--amd64-x86_64-with-debian-8.2--2.7.9-64/scan_perform/mod.cpp:2172) NotImplementedError: We didn't implemented yet the case where scan do 0 iteration Apply node that caused the error: forall_inplace,gpu,scan_fn}(Elemwise{Composite{minimum(maximum(((i0 + i1) - i1), (i2 - i1)), i3)}}.0, GpuSubtensor{int64:int64:int64}.0, GpuIncSubtensor{InplaceSet;:int64:}.0, GpuDimShuffle{1,0}.0) Inputs types: [TensorType(int64, scalar), CudaNdarrayType(float32, matrix), CudaNdarrayType(float32, matrix), CudaNdarrayType(float32, matrix)] Inputs shapes: [(), (0, 17), (2, 17), (17, 17)] Inputs strides: [(), (17, 1), (17, 1), (1, 17)] Inputs values: [array(0), <CudaNdarray object at 0x7f9a661fb270>, 'not shown', 'not shown']

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'. HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

zseder commented 8 years ago

I think https://github.com/Theano/Theano/issues/3276 will solve the issue, so only an update of theano is needed, but still testing...

zseder commented 8 years ago

solved in #102