Closed mohammedayub44 closed 6 years ago
You need to also set REBUILD_DATASET
to False
, in order to load the dataset.
Moreover, can you indicate how did you "Generated the Dataset_En_Es.pkl file by reversing the source and target files." ?
Note that, if you run a Spanish->English experiment, you'll have a model with a given input and output dimensions (the source and target vocabulary sizes). If you reuse this model, the vocabularies must be the same, in order to avoid incompatible shapes and different mappings word<->index.
Hi @lvapeab
To make it simple I was replicating this with Jupyter notebook files first, here are steps I did:
To generate Dataset_xxx.pkl file (did the same for the above main.py file '--dataset' argument)
1) In 1_Dataprep.ipynb - reversed the setOutput and setInput files paths, all other arguments to these functions are kept same. Gave me the below Dataset Summary and Vocabulary Lengths(English=516 and Spanish=689)
>>print(ds)
---------------------------------------------
Dataset nfpa_dataset
---------------------------------------------
store path: nfpatutorial
data length:
train - 9900
val - 100
test - 0
[ INPUTS ]
text: source_text
text: state_below
[ OUTPUTS ]
text: target_text
---------------------------------------------
>> ds.vocabulary_len
{'target_text': 689, 'source_text': 516, 'state_below': 689}
2) In 2_Training.ipynb -
Loaded the dataset
>> params = load_parameters()
>> dataset = loadDataset('/home/ubuntu/nmt_keras-data/datasets/Dataset_tutorial_dataset_En_Sp.pkl')
Changed below 3 Params and creating an Instance of GroundHogModel:
params['INPUT_VOCABULARY_SIZE'] = dataset.vocabulary_len['source_text']
params['OUTPUT_VOCABULARY_SIZE'] = dataset.vocabulary_len['target_text']
params['STORE_PATH'] = '/home/ubuntu/nmt_keras-data/trained_models/english_spanish_translationmodel2/'
nmt_model = TranslationModel(params, model_type='GroundHogModel', model_name='tutorial_model2', vocabularies=dataset.vocabulary, store_path='/home/ubuntu/nmt_keras-data/trained_models/english_spanish_translationmodel2/', verbose=True)
Rest of the cells kept the same and ran the nmt_model.trainNet(dataset, training_params)
It gave the same error. My guess is its not that straight forward, maybe need to change the GroundHogModel architecture to support this ? I'm new to this so any help appreciated.
-Mohammed Ayub
Your problem is that you are loading a model which expects source/target dimensions of 689/519 and you are providing a dataset with vocabulary dimensions 519/689.
You should use the same vocabularies for En->Es and Es->En. For doing this, you can use the Dataset
function merge_vocabularies
.
In 1_Dataprep.ipynb, after cell [4], you can do something like:
ds.merge_vocabularies(['source_text', 'target_text'])
.
This will compute the union of the source/target vocabularies and will update the Dataset
object accordingly. Next, you should train a new model with this new Dataset.
I haven't tested it, but I think it will work :)
Thanks @lvapeab for that.
In short, tried and it works, couple of troubleshooting questions (as it looks like magic happened behind the scenes)
Steps:
1)In First notebook file - Merged vocabularies after 5th cell (post creation of Source and target)
>> ds.merge_vocabularies(['source_text', 'target_text'])
[30/08/2018 13:28:07] Merging vocabularies of the following ids: ['source_text', 'target_text']
[30/08/2018 13:28:07] The new total is 974.
>> ds.vocabulary_len
{'target_text': 974, 'source_text': 974, 'state_below': 689}
2) In Second notebook file - loaded the modified dataset from above step and crated a new GroudHogModel instance without changing any cell. Ran the training and it worked.
Questions -
1) My GroundHogModel instance - layer shapes looks very different for the last layer from the example notebook ( training notebook Cell 4 output, File attached for reference -tutorial_model2.txt) Is this auto calculated or custom set in the Translation Model params, not sure how this will effect inference/decoding process ?
2) To run this as python main.py --dataset= xxx.pkl
I'm assuming the above steps should also work for this. In addition to changing REBUILD_DATASET
to False
in config file.
3) I'm running this on an amazon p3.8xlarge instance with 4GPU's and its shows that its using only one GPU, when I run the below command to check usage while training.
nvidia-smi --query-gpu=timestamp,name,pci.bus_id,driver_version,pstate,pcie.link.gen.max,pcie.link.gen.current,temperature.gpu,utilization.gpu,utilization.memory,memory.total,memory.free,memory.used --format=csv -l 5
Appreciate your help. tutorial_model2.txt
-Mohammed Ayub
[UPDATE] - Sort of figured out partial solution below to my above questions 1 and 2, please let me know if I went down the wrong path. ( I can open up another issue for question 3, since its not related to this thread ?)
1) Changed the below params before creating the model instance to mimic example setup:
params['ENCODER_HIDDEN_SIZE'] = 256
params['DECODER_HIDDEN_SIZE'] = 256
params['ATTENTION_SIZE'] = 256
params['SOURCE_TEXT_EMBEDDING_SIZE'] = 300
params['TARGET_TEXT_EMBEDDING_SIZE'] = 300
params['SKIP_VECTORS_HIDDEN_SIZE'] = 300
params['MODEL_SIZE'] = 300
params['USE_NOISE'] = True
params['DEEP_OUTPUT_LAYERS'] = [('linear', 300)]
This is the output for 100 epocs:
2) I made similar changes to config.py to run the main.py (for English->Spanish)
Line 11,12 --reversed the source and target
REBUILD_DATASET=False
(as suggested)
>> python main.py --dataset=/home/ubuntu/nmt_keras-data/datasets/Dataset_nfpa_dataset.pkl ENCODER_HIDDEN_SIZE=256 DECODER_HIDDEN_SIZE=256 ATTENTION_SIZE=256 SOURCE_TEXT_EMBEDDING_SIZE=300 TARGET_TEXT_EMBEDDING_SIZE=300 SKIP_VECTORS_HIDDEN_SIZE=300 MODEL_SIZE=300 USE_NOISE=True MAX_EPOCH=3 DEEP_OUTPUT_LAYERS="[('linear', 300)]"
Here is the Sample Command line output - sample_main_cmd_ouput_3epochs.txt
-Mohammed Ayub
Hi,
I think you want to reload the model from a given epoch. You should set the parameter RELOAD
to the desired epoch to reload.
With respect to the multiGPU, it is not supported. I hope to implement it in a few days, though
@lvapeab Great Thanks. I ran the decoding (reload +decoded test sentences) for couple of epoch points. It doesn't seem to give me good results (ie. English--> Spanish) like it did for Spanish to English. They seem to be fairly low.
Results Below:
Spanish-->English English-->Spanish(epoch 72) English-->Spanish(epoch 100)
(taken from python notebook)
Bleu_1: 0.9497048078508498
Bleu_1': 0.81257627965545
Bleu_1: 0.8006159201340844
Bleu_2: 0.9443201040962285
Bleu_2: 0.7385077934634473
Bleu_2: 0.7273109212714537
Bleu_3: 0.941726347487433
Bleu_3: 0.6830458006857428
Bleu_3: 0.6722753442108618
Bleu_4: 0.9401072076298569
Bleu_4: 0.6382079993175922
Bleu_4: 0.627573898190403
CIDEr: 9.345228814102258
CIDEr: 5.61866059228601
CIDEr: 5.482452992256578
METEOR: 0.7086728343657942
METEOR: 0.78775437247034
METEOR: 0.77756316306938
ROUGE_L: 0.9476926542686434
ROUGE_L: 0.767403946943525
ROUGE_L: 0.75705568703425
TER: 0.062433267771846025
TER: 0.26339833823487424
TER: 0.2771036176227051
Also, could you elaborate on merge vocabularies trick we used. I'm planning to test this by adding (with and without) more domain specific terminology (legal terms) for both English and Spanish (maybe 10000 or more), what options do I have for this. ?
Appreciate your help.
Thanks !
-Mohammed Ayub
For this toy task, it is normal. The data are very biased and there are large differences between En->Es and Es->En.
Regarding your questions, I think the way to go is to apply joint BPE to a corpus and after, tag it properly (see Sec. 3) and train regularly with this corpus with mixed language directions.
@lvapeab Great Thanks a lot for those suggestions. I will try them out and post issues if any. Closing this as it solved my original question.
-Mohammed Ayub
Your problem is that you are loading a model which expects source/target dimensions of 689/519 and you are providing a dataset with vocabulary dimensions 519/689.
You should use the same vocabularies for En->Es and Es->En. For doing this, you can use the
Dataset
functionmerge_vocabularies
.In 1_Dataprep.ipynb, after cell [4], you can do something like:
ds.merge_vocabularies(['source_text', 'target_text'])
.This will compute the union of the source/target vocabularies and will update the
Dataset
object accordingly. Next, you should train a new model with this new Dataset.I haven't tested it, but I think it will work :)
Your problem is that you are loading a model which expects source/target dimensions of 689/519 and you are providing a dataset with vocabulary dimensions 519/689.
You should use the same vocabularies for En->Es and Es->En. For doing this, you can use the
Dataset
functionmerge_vocabularies
.In 1_Dataprep.ipynb, after cell [4], you can do something like:
ds.merge_vocabularies(['source_text', 'target_text'])
.This will compute the union of the source/target vocabularies and will update the
Dataset
object accordingly. Next, you should train a new model with this new Dataset.I haven't tested it, but I think it will work :)
I'm a little confused about this answer. Why is the issue the vocabulary size? It says that the incompatible shapes are [50,19,689] vs. [50,18,689], where the second value is different, 19 and 18, not the third value.
Hi, I successfully ran Spanish-->English. However I need to do the reverse (English-->Sapnish) for my Use Case. I'm getting Incompatible shape error. I have followed the below steps: 1) Generated the Dataset_En_Es.pkl file by reversing the source and target files. 2) Changes in config.py file: Line 11,12 --reversed the source and target 3) Ran this statement:
python main.py --dataset /home/ubuntu/nmt_keras-data/datasets/Dataset_En_Es.pkl
It gives the below error:
[20/08/2018 19:19:08] Starting training!
[20/08/2018 19:19:08] <<< Training model >>>
[20/08/2018 19:19:08] Training parameters: {'n_epochs': 500, 'batch_size': 50, 'homogeneous_batches': False, 'maxlen': 50, 'joint_batches': 4, 'lr_decay': None, 'initial_lr': 0.001, 'reduce_each_epochs': False, ...
[20/08/2018 19:19:08] <<< creating directory /home/ubuntu/nmt_keras-data/trained_models/EnSpTrans_enes_AttentionRNNEncoderDecoder_src_emb_32_bidir_True_enc_LSTM_32_dec_ConditionalLSTM_32_deepout_linear_trg_emb_32_Adam_0.001//tensorboard_logs ... >>> 2018-08-20 19:19:18.307649: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-08-20 19:19:18.964532: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-08-20 19:19:18.964790: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53 pciBusID: 0000:00:1e.0 totalMemory: 15.78GiB freeMemory: 783.94MiB
2018-08-20 19:19:18.964820: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-08-20 19:19:19.295832: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-08-20 19:19:19.295889: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2018-08-20 19:19:19.295898: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2018-08-20 19:19:19.296066: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 484 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1e.0, compute capability: 7.0)
Epoch 1/500
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call return fn(*args)
File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [50,19,689] vs. [50,18,689][[Node: loss/target_text_loss/mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](_arg_target_text_target_0_3/_395, loss/target_text_loss/Log)]] [[Node: loss/add_51/_811 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_8502_loss/add_51", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]
I'm I missing something to change here. Appreciate any help.