all words with a frequency of under min_frequency are being trimmed.
but - their mapping are still the same as the previous ids.
meaning some words will have a mapping bigger than the length of the vocabulary.
in this line:
out.itemset(tuple([i, idx, value]), 1.0)
it tries to put a value bigger than allowed resulting in:
...
English: We must support initiatives increasing the availability and improving the. French (pred): Nous C'est s'articulent s'articulent s'articulent s'articulent s'articulent s'articulent s'articulent s'articulent s'articulent, French (gold): Nous devons soutenir les initiatives visant à développer les ressources en eau et à améliorer la distribution et la gestion de ce produit très peu abondant dans la région.
[ 401 868 853 1003 1093 3 6961 16 1873 3] [ 387 198 41289 41289 41289 41289 41289 41289 41289 41289 41289]
Traceback (most recent call last):
File "neural_translation_word.py", line 153, in <module>
translator.fit(X_train, y_train, logdir=PATH)
File "/usr/local/lib/python2.7/dist-packages/skflow/estimators/base.py", line 235, in fit
feed_params_fn=self._data_feeder.get_feed_params)
File "/usr/local/lib/python2.7/dist-packages/skflow/trainer.py", line 114, in train
feed_dict = feed_dict_fn()
File "/usr/local/lib/python2.7/dist-packages/skflow/io/data_feeder.py", line 308, in _feed_dict_fn
out.itemset(tuple([i, idx, value]), 1.0)
IndexError: index 47754 is out of bounds for axis 2 with size 47728
possible fix:
remap all the ids to 1..max_frequency
When adding words to vocabulary, They being added with a number as mapping. when calling:
all words with a frequency of under min_frequency are being trimmed. but - their mapping are still the same as the previous ids. meaning some words will have a mapping bigger than the length of the vocabulary.
in this line:
out.itemset(tuple([i, idx, value]), 1.0)
it tries to put a value bigger than allowed resulting in:
possible fix: remap all the ids to 1..max_frequency