ValueError: could not broadcast input array from shape (384) into shape (300)

varunnrao commented 6 years ago

Is there an issue with the spacy tensor?

zzzhacker commented 6 years ago

I have same issue

akshaybhatia10 commented 6 years ago

Yes, the problem is that for each token, spacy is returning a 384 dim vector instead of 300. One quick fix is to take first 300 values only like - question_tensor[0,j,:] = tokens[j].vector[:300] since the VQA model takes a 300 length vector as word_feature_size.

anurgsrivastava commented 6 years ago

I'm getting this error is I reduce the size.

UserWarning: Trying to unpickle estimator LabelEncoder from version pre-0.18 when using version 0.19.1. This might lead to breaking code or invalid results. Use at your own risk.

Does anyone know what this is? Urgent help needed

michalvavrecka commented 6 years ago

I solved the same issue with the vector dimension and also the userwarning dedicated to newer scikit version (it is neccessary to replicke the file with joblib.dump). After these changes I am not able to replicate your results while downloading pretrained models. The test image and question gives best answer as "30 % - electricity"instead of train. All what questions result in number answer, Where questions result in yes/no answer. Can you tell me, whether the dimension reduction should result in such a distortion?

anurgsrivastava commented 6 years ago

Yes it will. How do we fix this? Can anyone please help. Why is the vector size 384 when it should be 300?

anurgsrivastava commented 6 years ago

I tried the following

word_embedding = spacy.load('en', vectors = 'en_glove_cc_300_1m_vectors')
tokens = word_embedding(question)
word_embeddings = word_embedding.vocab.vectors.resize((1000000, 300))
question_tensor = np.zeros((1, 30, 300))
for j in range(len(tokens)):
    question_tensor[0,j,:] = tokens[j].vector
return question_tensor

Even after resizing the vectors, the error is removed but it is giving wrong answers. No idea what to do :/ I tried really hard but couldn't find anything online too.

iamaaditya commented 6 years ago

I will take a look at the code. Could you tell me the version of your Keras and Tensorflow so that I can test it correctly.

On Wed, May 2, 2018, 11:01 AM theanuragsrivastava notifications@github.com wrote:

I tried the following

word_embedding = spacy.load('en', vectors = 'en_glove_cc_300_1m_vectors') tokens = word_embedding(question) word_embeddings = word_embedding.vocab.vectors.resize((1000000, 300)) question_tensor = np.zeros((1, 30, 300)) for j in range(len(tokens)): question_tensor[0,j,:] = tokens[j].vector return question_tensor

Even after resizing the vectors, the error is removed but it is giving wrong answers. No idea what to do :/ I tried really hard but couldn't find anything online too.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/iamaaditya/VQA_Demo/issues/15#issuecomment-386066988, or mute the thread https://github.com/notifications/unsubscribe-auth/AB008mS1JKHM7ma5k9QYHWcc1XOM4MtZks5tufRpgaJpZM4RV8Id .

--

Thanks and Regards Adi

anurgsrivastava commented 6 years ago

I'm using the following versions

Keras = 2.0.5 Tensorflow = 1.2.0

iamaaditya commented 6 years ago

Issue is because Spacy updated the pretrained word embeddings model. Do the following to fix the issue.

On the terminal

python -m spacy download en_vectors_web_lg

In the code (demo.py), change the line

word_embeddings = spacy.load('en', vectors='en_glove_cc_300_1m_vectors')

to

word_embeddings = spacy.load('en_vectors_web_lg')

Ravikiran2611 commented 5 years ago

when is execute the following model = gensim.models.KeyedVectors.load_word2vec_format('./data/GoogleNews-vectors-negative300.bin.gz', binary=True)` i get an error saying ValueError: could not broadcast input array from shape (75) into shape (300)

can anyone help me Thanks In advance!!!!!!!!!!!!

iamaaditya / VQA_Demo

ValueError: could not broadcast input array from shape (384) into shape (300) #15