lvapeab / nmt-keras

Neural Machine Translation with Keras
http://nmt-keras.readthedocs.io
MIT License
532 stars 130 forks source link

Reverse translation English to Spanish--Error Incompatible shapes #67

Closed mohammedayub44 closed 6 years ago

mohammedayub44 commented 6 years ago

Hi, I successfully ran Spanish-->English. However I need to do the reverse (English-->Sapnish) for my Use Case. I'm getting Incompatible shape error. I have followed the below steps: 1) Generated the Dataset_En_Es.pkl file by reversing the source and target files. 2) Changes in config.py file: Line 11,12 --reversed the source and target 3) Ran this statement: python main.py --dataset /home/ubuntu/nmt_keras-data/datasets/Dataset_En_Es.pkl

It gives the below error: [20/08/2018 19:19:08] Starting training! [20/08/2018 19:19:08] <<< Training model >>> [20/08/2018 19:19:08] Training parameters: {'n_epochs': 500, 'batch_size': 50, 'homogeneous_batches': False, 'maxlen': 50, 'joint_batches': 4, 'lr_decay': None, 'initial_lr': 0.001, 'reduce_each_epochs': False, ... [20/08/2018 19:19:08] <<< creating directory /home/ubuntu/nmt_keras-data/trained_models/EnSpTrans_enes_AttentionRNNEncoderDecoder_src_emb_32_bidir_True_enc_LSTM_32_dec_ConditionalLSTM_32_deepout_linear_trg_emb_32_Adam_0.001//tensorboard_logs ... >>> 2018-08-20 19:19:18.307649: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2018-08-20 19:19:18.964532: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2018-08-20 19:19:18.964790: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53 pciBusID: 0000:00:1e.0 totalMemory: 15.78GiB freeMemory: 783.94MiB 2018-08-20 19:19:18.964820: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0 2018-08-20 19:19:19.295832: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-08-20 19:19:19.295889: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0 2018-08-20 19:19:19.295898: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N 2018-08-20 19:19:19.296066: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 484 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1e.0, compute capability: 7.0) Epoch 1/500 Traceback (most recent call last): File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call return fn(*args) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [50,19,689] vs. [50,18,689] [[Node: loss/target_text_loss/mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](_arg_target_text_target_0_3/_395, loss/target_text_loss/Log)]] [[Node: loss/add_51/_811 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_8502_loss/add_51", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]

I'm I missing something to change here. Appreciate any help.

lvapeab commented 6 years ago

You need to also set REBUILD_DATASET to False, in order to load the dataset.

Moreover, can you indicate how did you "Generated the Dataset_En_Es.pkl file by reversing the source and target files." ?

Note that, if you run a Spanish->English experiment, you'll have a model with a given input and output dimensions (the source and target vocabulary sizes). If you reuse this model, the vocabularies must be the same, in order to avoid incompatible shapes and different mappings word<->index.

mohammedayub44 commented 6 years ago

Hi @lvapeab

To make it simple I was replicating this with Jupyter notebook files first, here are steps I did:

To generate Dataset_xxx.pkl file (did the same for the above main.py file '--dataset' argument)

1) In 1_Dataprep.ipynb - reversed the setOutput and setInput files paths, all other arguments to these functions are kept same. Gave me the below Dataset Summary and Vocabulary Lengths(English=516 and Spanish=689)

>>print(ds) --------------------------------------------- Dataset nfpa_dataset --------------------------------------------- store path: nfpatutorial data length: train - 9900 val - 100 test - 0

[ INPUTS ] text: source_text text: state_below

[ OUTPUTS ] text: target_text ---------------------------------------------

>> ds.vocabulary_len {'target_text': 689, 'source_text': 516, 'state_below': 689}

2) In 2_Training.ipynb -

It gave the same error. My guess is its not that straight forward, maybe need to change the GroundHogModel architecture to support this ? I'm new to this so any help appreciated.

-Mohammed Ayub

lvapeab commented 6 years ago

Your problem is that you are loading a model which expects source/target dimensions of 689/519 and you are providing a dataset with vocabulary dimensions 519/689.

You should use the same vocabularies for En->Es and Es->En. For doing this, you can use the Dataset function merge_vocabularies.

In 1_Dataprep.ipynb, after cell [4], you can do something like:

ds.merge_vocabularies(['source_text', 'target_text']).

This will compute the union of the source/target vocabularies and will update the Dataset object accordingly. Next, you should train a new model with this new Dataset.

I haven't tested it, but I think it will work :)

mohammedayub44 commented 6 years ago

Thanks @lvapeab for that. In short, tried and it works, couple of troubleshooting questions (as it looks like magic happened behind the scenes) Steps: 1)In First notebook file - Merged vocabularies after 5th cell (post creation of Source and target) >> ds.merge_vocabularies(['source_text', 'target_text']) [30/08/2018 13:28:07] Merging vocabularies of the following ids: ['source_text', 'target_text'] [30/08/2018 13:28:07] The new total is 974. >> ds.vocabulary_len {'target_text': 974, 'source_text': 974, 'state_below': 689}

2) In Second notebook file - loaded the modified dataset from above step and crated a new GroudHogModel instance without changing any cell. Ran the training and it worked.

Questions - 1) My GroundHogModel instance - layer shapes looks very different for the last layer from the example notebook ( training notebook Cell 4 output, File attached for reference -tutorial_model2.txt) Is this auto calculated or custom set in the Translation Model params, not sure how this will effect inference/decoding process ? 2) To run this as python main.py --dataset= xxx.pkl I'm assuming the above steps should also work for this. In addition to changing REBUILD_DATASET to False in config file. 3) I'm running this on an amazon p3.8xlarge instance with 4GPU's and its shows that its using only one GPU, when I run the below command to check usage while training. nvidia-smi --query-gpu=timestamp,name,pci.bus_id,driver_version,pstate,pcie.link.gen.max,pcie.link.gen.current,temperature.gpu,utilization.gpu,utilization.memory,memory.total,memory.free,memory.used --format=csv -l 5

Appreciate your help. tutorial_model2.txt

-Mohammed Ayub

mohammedayub44 commented 6 years ago

[UPDATE] - Sort of figured out partial solution below to my above questions 1 and 2, please let me know if I went down the wrong path. ( I can open up another issue for question 3, since its not related to this thread ?)

1) Changed the below params before creating the model instance to mimic example setup: params['ENCODER_HIDDEN_SIZE'] = 256 params['DECODER_HIDDEN_SIZE'] = 256 params['ATTENTION_SIZE'] = 256 params['SOURCE_TEXT_EMBEDDING_SIZE'] = 300 params['TARGET_TEXT_EMBEDDING_SIZE'] = 300 params['SKIP_VECTORS_HIDDEN_SIZE'] = 300 params['MODEL_SIZE'] = 300 params['USE_NOISE'] = True params['DEEP_OUTPUT_LAYERS'] = [('linear', 300)]

This is the output for 100 epocs: epoch_100

2) I made similar changes to config.py to run the main.py (for English->Spanish) Line 11,12 --reversed the source and target REBUILD_DATASET=False (as suggested)

>> python main.py --dataset=/home/ubuntu/nmt_keras-data/datasets/Dataset_nfpa_dataset.pkl ENCODER_HIDDEN_SIZE=256 DECODER_HIDDEN_SIZE=256 ATTENTION_SIZE=256 SOURCE_TEXT_EMBEDDING_SIZE=300 TARGET_TEXT_EMBEDDING_SIZE=300 SKIP_VECTORS_HIDDEN_SIZE=300 MODEL_SIZE=300 USE_NOISE=True MAX_EPOCH=3 DEEP_OUTPUT_LAYERS="[('linear', 300)]"

Here is the Sample Command line output - sample_main_cmd_ouput_3epochs.txt

-Mohammed Ayub

lvapeab commented 6 years ago

Hi, I think you want to reload the model from a given epoch. You should set the parameter RELOAD to the desired epoch to reload.

With respect to the multiGPU, it is not supported. I hope to implement it in a few days, though

mohammedayub44 commented 6 years ago

@lvapeab Great Thanks. I ran the decoding (reload +decoded test sentences) for couple of epoch points. It doesn't seem to give me good results (ie. English--> Spanish) like it did for Spanish to English. They seem to be fairly low.

Results Below:

Spanish-->English English-->Spanish(epoch 72) English-->Spanish(epoch 100) (taken from python notebook) Bleu_1: 0.9497048078508498 Bleu_1': 0.81257627965545 Bleu_1: 0.8006159201340844 Bleu_2: 0.9443201040962285 Bleu_2: 0.7385077934634473 Bleu_2: 0.7273109212714537 Bleu_3: 0.941726347487433 Bleu_3: 0.6830458006857428 Bleu_3: 0.6722753442108618 Bleu_4: 0.9401072076298569 Bleu_4: 0.6382079993175922 Bleu_4: 0.627573898190403 CIDEr: 9.345228814102258 CIDEr: 5.61866059228601 CIDEr: 5.482452992256578 METEOR: 0.7086728343657942 METEOR: 0.78775437247034 METEOR: 0.77756316306938 ROUGE_L: 0.9476926542686434 ROUGE_L: 0.767403946943525 ROUGE_L: 0.75705568703425 TER: 0.062433267771846025 TER: 0.26339833823487424 TER: 0.2771036176227051

Also, could you elaborate on merge vocabularies trick we used. I'm planning to test this by adding (with and without) more domain specific terminology (legal terms) for both English and Spanish (maybe 10000 or more), what options do I have for this. ?

Appreciate your help.

Thanks !

-Mohammed Ayub

lvapeab commented 6 years ago

For this toy task, it is normal. The data are very biased and there are large differences between En->Es and Es->En.

Regarding your questions, I think the way to go is to apply joint BPE to a corpus and after, tag it properly (see Sec. 3) and train regularly with this corpus with mixed language directions.

mohammedayub44 commented 6 years ago

@lvapeab Great Thanks a lot for those suggestions. I will try them out and post issues if any. Closing this as it solved my original question.

-Mohammed Ayub

aklam commented 5 years ago

Your problem is that you are loading a model which expects source/target dimensions of 689/519 and you are providing a dataset with vocabulary dimensions 519/689.

You should use the same vocabularies for En->Es and Es->En. For doing this, you can use the Dataset function merge_vocabularies.

In 1_Dataprep.ipynb, after cell [4], you can do something like:

ds.merge_vocabularies(['source_text', 'target_text']).

This will compute the union of the source/target vocabularies and will update the Dataset object accordingly. Next, you should train a new model with this new Dataset.

I haven't tested it, but I think it will work :)

Your problem is that you are loading a model which expects source/target dimensions of 689/519 and you are providing a dataset with vocabulary dimensions 519/689.

You should use the same vocabularies for En->Es and Es->En. For doing this, you can use the Dataset function merge_vocabularies.

In 1_Dataprep.ipynb, after cell [4], you can do something like:

ds.merge_vocabularies(['source_text', 'target_text']).

This will compute the union of the source/target vocabularies and will update the Dataset object accordingly. Next, you should train a new model with this new Dataset.

I haven't tested it, but I think it will work :)

I'm a little confused about this answer. Why is the issue the vocabulary size? It says that the incompatible shapes are [50,19,689] vs. [50,18,689], where the second value is different, 19 and 18, not the third value.