Closed VP007-py closed 2 years ago
Hi Vinay,
Unfortunately, I'm unable to reproduce the error. Please, attach the config.py
file and make sure you are working with the latest version. If you modified data_engine/prepare_data.py
, please also share it.
That being said, my guess is that you are calling setInput
when you want to call setRawOutput
somewhere in data_engine/prepare_data.py
. However, note that setRawOutput
was removed in https://github.com/lvapeab/nmt-keras/commit/4ba94e2b88fc2c098a9ea9f528e0b8ca75ffc32d as it made not sense to keep this in the dataset for the general use case.
Hey, apologies for not updating...the current version works perfectly fine !
Once again, I get a similar error for different datasets. I did check the parallel corpora and there are no issues with it
Using TensorFlow backend.
[11/08/2020 19:49:41] Limited tf.compat.v2.summary API due to missing TensorBoard installation.
[11/08/2020 19:49:44] Running training.
[11/08/2020 19:49:44] Building Newdataset_hien dataset
[11/08/2020 19:49:45] Applying tokenization function: "tokenize_none".
[11/08/2020 19:49:45] Creating vocabulary for data with data_id 'target_text'.
[11/08/2020 19:49:46] Total: 97033 unique words in 95000 sentences with a total of 1977052 words.
[11/08/2020 19:49:46] Creating dictionary of all words
[11/08/2020 19:49:47] Loaded "train" set outputs of data_type "text-features" with data_id "target_text" and length 95000.
[11/08/2020 19:49:47] Applying tokenization function: "tokenize_none".
[11/08/2020 19:49:47] Loaded "val" set outputs of data_type "text" with data_id "target_text" and length 5000.
[11/08/2020 19:49:47] Applying tokenization function: "tokenize_none".
[11/08/2020 19:49:47] Loaded "test" set outputs of data_type "text" with data_id "target_text" and length 2500.
[11/08/2020 19:49:47] Applying tokenization function: "tokenize_none".
Traceback (most recent call last):
File "main.py", line 51, in <module>
train_model(parameters, args.dataset)
File "/home/pandramish.vinay/nmt-keras/nmt_keras/training.py", line 74, in train_model
dataset = build_dataset(params)
File "/home/pandramish.vinay/nmt-keras/data_engine/prepare_data.py", line 185, in build_dataset
bpe_codes=params.get('BPE_CODES_PATH', None))
File "/home/pandramish.vinay/.local/lib/python3.5/site-packages/keras_wrapper/dataset.py", line 1204, in setInput
use_unk_class=use_unk_class)
File "/home/pandramish.vinay/.local/lib/python3.5/site-packages/keras_wrapper/dataset.py", line 2097, in preprocessTextFeatures
'" in order to process the type "text" data. Set "build_vocabulary" to True if you want to use the current data for building the vocabulary.')
Exception: The dataset must include a vocabulary with data_id "source_text" in order to process the type "text" data. Set "build_vocabulary" to True if you want to use the current data for building the vocabulary.
``
Did you set build_vocabulary = True
when building the Dataset object?
I did enable build_vocabulary = True
in ds.setInput here and the same error occurs somtimes
Sometimes it fails... but other times it works? Weird
Can you share your config.py
file?
This is the error log I get while reinstalling and running the model with
python3 main.py
. Any fixes/suggestions with keras_wrapper ?