Open GabrielLin opened 6 years ago
Hi! I added instructions on how to do that in the readme file. Let me know if you encounter any problems.
When I run
python tagger.py tag -p ud1 -r raw.txt -m model_ud1 -emb Embeddings/glove.txt -opth tagged_file.txt
It shows the following error:
Numbers of sentences: 1. Longest sentence is 267. Traceback (most recent call last): File "tagger.py", line 408, in
raw_x[k] = toolbox.pad_zeros(raw_x[k], max_step) File "/data1/myname/nlp/tagger/toolbox.py", line 826, in pad_zeros return [np.pad(item, (0, max_len - len(item)), 'constant', constant_values=0) for item in l] File "/opt/anaconda2/envs/tf1p3py27/lib/python2.7/site-packages/numpy/lib/arraypad.py", line 1295, in pad pad_width = _validate_lengths(narray, pad_width) File "/opt/anaconda2/envs/tf1p3py27/lib/python2.7/site-packages/numpy/lib/arraypad.py", line 1086, in _validate_lengths raise ValueError(fmt % (number_elements,)) ValueError: (0, -2) cannot contain negative values.
It works fine on my machine. Please check your raw.txt file. Is it one raw sentence per line? Does it only have one sentence?
I find something, but not very sure. It may about English words with spaces. Such as
伦敦当地时间10月18日18:00(北京时间19日01:00),AlphaGo Zero再次登上世界顶级科学杂志——《自然》。
will causes that error.
But if there are no spaces between English words. Such as
伦敦当地时间10月18日18:00(北京时间19日01:00),AlphaGo再次登上世界顶级科学杂志——《自然》。
It is OK.
Ok. I'll try to fix this.
I tested your sentence and it seemed to work fine, but I made some small changes anyway. Please try again and see if it works now.
Your response speed is amazing. In my side, the error remained. Please help to try the file directly. Thanks.
Ok. I fixed some minor stuff. Could you try again? Thanks!
Thanks. It does not show any error messages now. But the result may be better. In my model, it separate AlphaGo into 'Alpha' and 'Go', then join 'Go' with 'Zero' as 'GoZero'. Do you have this situation?
_NUM 伦敦_PROPN 当地_NOUN 时间_NOUN 10_NUM 月_NOUN 18_NUM 日_NOUN 18:00(_NUM 北京_PROPN 时间_NOUN 19_NUM 日_NOUN 01:00),_NUM Alpha_X GoZero_X 再次_ADV 登上_VERB 世界_NOUN 顶级_ADJ 科学_NOUN 杂志_NOUN ——_NUM 《_PUNCT 自然_NOUN 》_PUNCT 。_PUNCT
Yes. Because the tagger is not clever enough to utilise the space information. I may fix that later.
Could you please show me how to tag sentence by the trained model? Thanks.