mead-ml / mead-baseline

Deep-Learning Model Exploration and Development for NLP
Apache License 2.0
243 stars 73 forks source link

Error running NER tagger example without embeddings #7

Closed polm closed 6 years ago

polm commented 6 years ago

The README for tag_char_rnn.py indicates it can be run without word vectors by omitting the --embed option. However, when I try to run it that way I get this error:

python tag_char_rnn.py --rnntype blstm --optim sgd --wsz 30 --eta 0.01 \
    --lower 1 \
    --epochs 50 --batchsz 10 --hsz 200 \
    --train ../data/oct27.train \
    --valid ../data/oct27.dev \
    --test ../data/oct27.test \
    --cfiltsz 1 2 3 4 5 7
Lower-case word tokens
Reading CONLL sequence file corpus
Max sentence length 38
Max word length 45
Traceback (most recent call last):
  File "tag_char_rnn.py", line 100, in <module>
    ts, _ = reader.load(args.train, word2index, args.batchsz, shuffle=True)
  File "/mnt/pool/fiddle/baseline/python/baseline/reader.py", line 219, in load
    words_vocab["<UNK>"] = 1
TypeError: 'NoneType' object does not support item assignment

Looks like it's trying to use embeddings even when not available - maybe the note about running it without them should be removed?

dpressel commented 6 years ago

You really shouldnt run a tagger without pre-trained word vectors. You simply will not be able to match performance obtainable with pre-trained embeddings. Can you point out where I said this, and I will fix it?

polm commented 6 years ago

The comment in the docs is here

If you want to use only the convolutional filter word vectors (and no word embeddings), just remove the -embed line above.

I understand that not using vectors has no chance of matching the performance of running with vectors, I just figured I'd give it a shot to check that everything else was OK while I was downloading the Glove vectors.

Besides that this is a lovely project, thanks for making it!

dpressel commented 6 years ago

Thank you!