Closed dpappas closed 6 years ago
It looks like you're using full vocabulary word vectors instead of reduced size word vectors. Please change this line to this vocab_size = 2196018
Interesting, I thought it was my mistake and I tried to change vocab_size to 2196018. And it turns out to be the correct behavior. I would suggest put this information somewhere in README or FAQ.
Thanks for picking it out. I just fixed it :)
Hello everyone
When you download the pretrained glove embeddings
wget http://nlp.stanford.edu/data/glove.840B.300d.zip
you get the entirety of the embeddings matrix with 2196018 words in the vocabulary.
Is there a case where you only kept 91604 words (perhaps after discarding any word not present in the squad corpus?). If that is the case it would be very helpful to provide only these embeddings as the model would be less demanding in resources.
Hi @dpappas, now instead of running
python process.py -process True
please run this line instead
python process.py --reduce_glove True --process True
to process the data.
Please make sure to delete
./data/trainset
and ./data/devset
before processing data.
Thank you very much for your immediate response!
I keep getting the following error. What can i do about it ? Thank you
Traceback (most recent call last): File "model.py", line 256, in <module> main() File "model.py", line 216, in main glove = np.reshape(glove,(Params.vocab_size,Params.emb_size)) File "/usr/local/lib/python2.7/dist-packages/numpy/core/fromnumeric.py", line 232, in reshape return _wrapfunc(a, 'reshape', newshape, order=order) File "/usr/local/lib/python2.7/dist-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc return getattr(obj, method)(*args, **kwds) ValueError: cannot reshape array of size 658805400 into shape (91604,300)