Open thirumalaipm opened 8 years ago
The avoid parameter is not very useful so I suggest to ignore it and always pass None. The idea behind it was to force the sample generator to generate something which is different than everything it previously made for the same desc.
However, I tried to follow how you reached your bug and from looking at the code it looks like all calls to gensamples are either when avoid is None or an array or list so perhaps you re-run the call to gensamples after avoid has somehow changed to an int64
Hi Udibr, Thanks for the prompt response. I could able to progress on that. But when I rerun the script, I could see the below error on In [60]. I see two rerun the script executed properly, but many time I see below error.
ValueError Traceback (most recent call last)
If you look at cell [35] of https://github.com/udibr/headlines/blob/master/predict.ipynb
you will see that the weights
shape is (944, 40000) however your error message said (40000,100) which happens to be the shape of the embedding
matrix loaded in step [10]
I am guessing that you re-run some of the cells in the notebook not in the exact order in which they appeared and somehow the embedding matrix was copied into weights (although I dont see how)
The safest way to run these notebooks is to "Kernel->Restart" and then execute the cells one after the other starting from the top...
I am getting the output for that cell[35] at predict is [(40000L, 100L)]. It is different from what is shown in your notebook.
The cell[34] output shows as follows- Loading data1/train.hdf5 to sequential_3 embedding_1 failed to find layer embedding_1 in model weights 40000x100 stopping to load all other layers
I also changed one parameter at Cell [9] nb_train_samples = 30000 nb_val_samples = 1000 I changed nb_val_samples value to 1000 because of an error message I got. I will reduce to the nb_train_sample to 10000 and try once. I am thinking could it be any issue..
When I am running the train notebook, I am getting below warning when In[60] running.
C:\Anaconda2\lib\site-packages\keras\engine\training.py:1402: UserWarning: Epoch comprised more than samples_per_epoch
samples, which might affect learning results. Set samples_per_epoch
correctly to avoid this warning.
warnings.warn('Epoch comprised more than '
Ok what happened is that calling load_weights
failed at the first layer (it should fail at the last layer)
and you got as a return value the weights of the first layer (which are (40000,100) and not that of the last layer (944,40000))
This could be because the train.ipynb notebook, which created the file loaded by load_weights
, was run twice without restarting the kernel. Each time you re-create the network nodes (for example in cell 26 of train) Keras create new numbering (for example embedding_2 if embedding_1 is already in use.) You can easily fix the load_weights function to convert names from file to how they are named in the model and that will fix your problem
Hi, Thanks for the previous reply.
After some retries, I could able to progress further. But the training is taking much time. I stopped after 250 iteration on the train.ipynb notebook In[60].
KeyError Traceback (most recent call last)
The '^' is just used for display to indicate that a word is not in the vocabulary used for modeling
There is also an "external" vocabulary word2idx
which includes all the words seen not just the smaller (internal) vocabulary used for training. However, it looks like Billy is not in it. All you need to do is add it to the external vocabulary. For example, the following code will add new words (and will keep existing words unmodified):
word2idx[new_word] = word2idx.get(new_word, len(word2idx))
Thanks udibr. This solution works.. :)
Hi,
Thanks for this project. I am trying this scripts and I could process fine with vocabulary-embedding and train scripts. When I tried predict, I am facing error on line In [69]. The error is as follows -
TypeError Traceback (most recent call last)