tensorflow / nmt

TensorFlow Neural Machine Translation Tutorial
Apache License 2.0
6.37k stars 1.96k forks source link

How to add my custom database and train it on that ? #321

Open shresthpaul133 opened 6 years ago

shresthpaul133 commented 6 years ago

I have my own dataset for the chatbot, I want to train that data only. Can you please tell me what all changes do I need to make so that I can fulfill my requirements ?

luozhouyang commented 6 years ago

you need these file:

If you already prepared these files, you can train it:

python -m nmt.nmt \
    --out_dir=$YOUR_OUT_DIR \
    --src=src --tgt=tgt \
    --train_prefix=$FILE_PTH/train \
    --dev_prefix=$FILE_PATH/dev \
    --test_prefix=$FILE_PATH/test \
    .. (other args)
luozhouyang commented 6 years ago

The logout is clear, your vocab file contains empty line. You need to make sure:

  1. NO empty line(s) in the vocab files
  2. NO repeated words in the vocab files. You can write a simple python script to filter the empty line and the repeated words, and then try again.

On Sat, Jun 23, 2018 at 4:37 AM Shresth notifications@github.com wrote:

@luozhouyang https://github.com/luozhouyang Sir it is giving one problem, it giving a problem in the vocab files. Here what it is saying. Here's what the whole error is like. Even I check the whole vocab files, there is no space that might be present, so I don't know what this is now. But yeah before it was giving errors in the train, test and dev files itself and I sort them out. But it's gonna be like week I am really not able to do any progress in this. Please help me out. Thank you

Here the error:-

2018-06-23 02:01:11.183505: W tensorflow/core/framework/op_kernel.cc:1278] OP_REQUIRES failed at lookup_table_init_op.cc:145 : Invalid argument: Invalid content in /tmp/nmt_model/vocab.tgt: empty line found at position 149. 2018-06-23 02:01:11.183537: W tensorflow/core/framework/op_kernel.cc:1278] OP_REQUIRES failed at lookup_table_init_op.cc:145 : Invalid argument: Invalid content in /tmp/nmt_model/vocab.src: empty line found at position 77. Traceback (most recent call last): File "/home/shresth/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1330, in _do_call return fn(args) File "/home/shresth/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1315, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/shresth/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1423, in _call_tf_sessionrun status, run_metadata) File "/home/shresth/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in exit* c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.InvalidArgumentError: Invalid content in /tmp/nmt_model/vocab.tgt: empty line found at position 149. [[Node: string_to_index_1/hash_table/table_init = InitializeTableFromTextFileV2delimiter="\t", key_index=-2, value_index=-1, vocab_size=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/shresth/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/shresth/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/shresth/Desktop/WEB/CHATBOT/nmt/nmt/nmt.py", line 605, in tf.app.run(main=main, argv=[sys.argv[0]] + unparsed) File "/home/shresth/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run _sys.exit(main(argv)) File "/home/shresth/Desktop/WEB/CHATBOT/nmt/nmt/nmt.py", line 598, in main run_main(FLAGS, default_hparams, train_fn, inference_fn) File "/home/shresth/Desktop/WEB/CHATBOT/nmt/nmt/nmt.py", line 591, in run_main train_fn(hparams, target_session=target_session) File "/home/shresth/Desktop/WEB/CHATBOT/nmt/nmt/train.py", line 328, in train train_model.model, model_dir, train_sess, "train") File "/home/shresth/Desktop/WEB/CHATBOT/nmt/nmt/model_helper.py", line 572, in create_or_load_model session.run(tf.tables_initializer()) File "/home/shresth/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 908, in run run_metadata_ptr) File "/home/shresth/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1143, in _run feed_dict_tensor, options, run_metadata) File "/home/shresth/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1324, in _do_run run_metadata) File "/home/shresth/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1343, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: Invalid content in /tmp/nmt_model/vocab.tgt: empty line found at position 149. [[Node: string_to_index_1/hash_table/table_init = InitializeTableFromTextFileV2delimiter="\t", key_index=-2, value_index=-1, vocab_size=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Caused by op 'string_to_index_1/hash_table/table_init', defined at: File "/home/shresth/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/shresth/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/shresth/Desktop/WEB/CHATBOT/nmt/nmt/nmt.py", line 605, in tf.app.run(main=main, argv=[sys.argv[0]] + unparsed) File "/home/shresth/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run _sys.exit(main(argv)) File "/home/shresth/Desktop/WEB/CHATBOT/nmt/nmt/nmt.py", line 598, in main run_main(FLAGS, default_hparams, train_fn, inference_fn) File "/home/shresth/Desktop/WEB/CHATBOT/nmt/nmt/nmt.py", line 591, in run_main train_fn(hparams, target_session=target_session) File "/home/shresth/Desktop/WEB/CHATBOT/nmt/nmt/train.py", line 296, in train train_model = model_helper.create_train_model(model_creator, hparams, scope) File "/home/shresth/Desktop/WEB/CHATBOT/nmt/nmt/model_helper.py", line 80, in create_train_model src_vocab_file, tgt_vocab_file, hparams.share_vocab) File "/home/shresth/Desktop/WEB/CHATBOT/nmt/nmt/utils/vocab_utils.py", line 87, in create_vocab_tables tgt_vocab_file, default_value=UNK_ID) File "/home/shresth/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/lookup_ops.py", line 999, in index_table_from_file init, default_value, shared_name=shared_name, name=hash_table_scope) File "/home/shresth/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/lookup_ops.py", line 279, in init super(HashTable, self).init(table_ref, default_value, initializer) File "/home/shresth/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/lookup_ops.py", line 171, in init self._init = initializer.initialize(self) File "/home/shresth/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/lookup_ops.py", line 520, in initialize name=scope) File "/home/shresth/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_lookup_ops.py", line 317, in initialize_table_from_text_file_v2 vocab_size=vocab_size, delimiter=delimiter, name=name) File "/home/shresth/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/home/shresth/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3327, in create_op op_def=op_def) File "/home/shresth/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1674, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Invalid content in /tmp/nmt_model/vocab.tgt: empty line found at position 149. [[Node: string_to_index_1/hash_table/table_init = InitializeTableFromTextFileV2delimiter="\t", key_index=-2, value_index=-1, vocab_size=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/nmt/issues/321#issuecomment-399574722, or mute the thread https://github.com/notifications/unsubscribe-auth/AgdJn5o7t3PY71FLvmaf1COBSJ8osKXjks5t_VWcgaJpZM4TsKBU .

shresthpaul133 commented 6 years ago

@luozhouyang Thank you sir this thing worked out, even now I trained the chatbot too. But now there is another error. When I run the model_test.py file, it is showing some failure. Here's the pic. I am using the command python -m nmt.model_test

image

saisriteja commented 5 years ago

@shresthpaul133 I am new to this topic....can you tell me how to make a train.src,train.tgt.....I have text in a .txt file