modernmt / modernmt

Neural Adaptive Machine Translation that adapts to context and learns from corrections.
http://www.modernmt.eu/
Apache License 2.0
342 stars 71 forks source link

cli.libs.osutils.ShellError: Command 't2t-datagen ..... #432

Closed ydshieh closed 5 years ago

ydshieh commented 5 years ago

I finally make a successful installation. However, the example on README.md

./mmt create en it examples/data/train --train-steps 6000

gives the following error. Any idea?


INFO: (4 of 6) Preparing data...

ERROR Unexpected exception: None Traceback (most recent call last): File "./mmt", line 610, in main() File "./mmt", line 556, in main actionscommand File "./mmt", line 143, in main_create builder.build() File "/media/XXX/YYY/Project/ModernMT/mmt/ModernMT/cli/mmt/engine.py", line 560, in build self._build(resume=False) File "/media/XXX/YYY/Project/ModernMT/mmt/ModernMT/cli/mmt/engine.py", line 633, in _build method(self, args, skip=skip, log=log_stream, delete_on_exit=self._delete_on_exit) File "/media/XXX/YYY/Project/ModernMT/mmt/ModernMT/cli/mmt/engine.py", line 432, in call self._f(*args, **kwargs) File "/media/XXX/YYY/Project/ModernMT/mmt/ModernMT/cli/mmt/engine.py", line 764, in _prepare_data log=log, bpe_symbols=self._bpe_symbols) File "/media/XXX/YYY/Project/ModernMT/mmt/ModernMT/cli/mmt/engine.py", line 157, in prepare_data osutils.shell_exec(command, stdout=log, stderr=log, env=env) File "/media/XXX/YYY/Project/ModernMT/mmt/ModernMT/cli/libs/osutils.py", line 57, in shell_exec raise ShellError(str_cmd, return_code, stderr_dump) cli.libs.osutils.ShellError: Command 't2t-datagen --t2t_usr_dir /media/XXX/YYY/Project/ModernMT/mmt/ModernMT/build/lib/t2t --data_dir=/media/XXX/YYY/Project/ModernMT/mmt/ModernMT/runtime/default/tmp/training/neural_train_data/data --tmp_dir=/media/XXX/YYY/Project/ModernMT/mmt/ModernMT/runtime/default/tmp/training/neural_train_data/tmp --problem=translate_mmt' failed with exit code 1

ydshieh commented 5 years ago

I use anaconda environment (all the python dependencies are installed into env ). The error probably is because the shell_exec use the default python , where the necessary packages are not installed?

ydshieh commented 5 years ago

After using ubuntu default python and installed dependencies to it, the same error still occurs...

nicolabertoldi commented 5 years ago

@chiapas for any reason t2t-datagen is not found

Did you install tensor2tensor in your environment? Are you able to run t2t-datagen? Could you identify the exact path where t2t-datagen is installed?

Could you identify the exact path where tensor2tensor is installed? You can find it by running

python -c 'import tensor2tensor ; print tensor2tensor.__path__'

which returns the installation path (say PATH) t2t-datagen should be in PATH/bin/

Please report the content of this PATH/bin/

ydshieh commented 5 years ago

@nicolabertoldi

The result of

python -c 'import tensor2tensor ; print tensor2tensor.__path__'

gives

['/usr/local/lib/python2.7/dist-packages/tensor2tensor']

where I can find 't2t-datagen'

When I type directly 't2t-datagen', it runs (but finally failed with other error).

If I type into the terminal the full command that is in cli.libs.osutils.ShellError: Command, that is

t2t-datagen --t2t_usr_dir /media/biggerpan/BiggerStorage/Project/ModernMT/mmt/ModernMT/build/lib/t2t --data_dir=/media/biggerpan/BiggerStorage/Project/ModernMT/mmt/ModernMT/runtime/default/tmp/training/neural_train_data/data --tmp_dir=/media/biggerpan/BiggerStorage/Project/ModernMT/mmt/ModernMT/runtime/default/tmp/training/neural_train_data/tmp --problem=translate_mmt

I finally got

INFO:tensorflow:Generating data for translate_mmt. :::MLPv0.5.0 transformer 1544870482.665570974 (/usr/local/lib/python2.7/dist-packages/tensor2tensor/data_generators/text_problems.py:306) preproc_tokenize_training Traceback (most recent call last): File "/usr/local/bin/t2t-datagen", line 28, in tf.app.run() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "/usr/local/bin/t2t-datagen", line 23, in main t2t_datagen.main(argv) File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/bin/t2t_datagen.py", line 198, in main generate_data_for_registered_problem(problem) File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/bin/t2t_datagen.py", line 260, in generate_data_for_registered_problem problem.generate_data(data_dir, tmp_dir, task_id) File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/data_generators/text_problems.py", line 306, in generate_data self.generate_encoded_samples(data_dir, tmp_dir, split)), paths) File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/data_generators/text_problems.py", line 262, in generate_encoded_samples generator = self.generate_samples(data_dir, tmp_dir, dataset_split) File "/media/biggerpan/BiggerStorage/Project/ModernMT/mmt/ModernMT/build/lib/t2t/problem.py", line 240, in generate_samples datasets = self.source_data_files(dataset_split) File "/media/biggerpan/BiggerStorage/Project/ModernMT/mmt/ModernMT/build/lib/t2t/problem.py", line 221, in source_data_files folder = _env_get_folder(ENV_MMT_PROBLEM_TRAIN_PATH if train else ENV_MMT_PROBLEM_DEV_PATH) File "/media/biggerpan/BiggerStorage/Project/ModernMT/mmt/ModernMT/build/lib/t2t/problem.py", line 45, in _env_get_folder value = os.environ[name] File "/usr/lib/python2.7/UserDict.py", line 40, in getitem raise KeyError(key) KeyError: 'MMT_PROBLEM_TRAIN_PATH'

ydshieh commented 5 years ago

And one more strange thing: When I type './mmt create en it examples/data/train --train-steps 6000', it gives

File "./mmt", line 44 print 'ERROR: Wrong version of Java, required Java 8' ^ SyntaxError: Missing parentheses in call to 'print'. Did you mean print('ERROR: Wrong version of Java, required Java 8')?

I need to type 'sudo /mmt create en it examples/data/train --train-steps 6000' (and it gives the error about t2t-datagen)

nicolabertoldi commented 5 years ago

@chiapas

concerning ENV_MMT_PROBLEM_TRAIN_PATH issue

you have to set 5 environment variables (which are usually set when the right command is run) You should run the following

export MMT_PROBLEM_SOURCE_LANG=en
export MMT_PROBLEM_TARGET_LANG=it
export MMT_PROBLEM_BPE=32768
export MMT_PROBLEM_TRAIN_PATH=${trainDir}
export MMT_PROBLEM_DEV_PATH=${evalDir}

where trainDir and evalDir should be the directories which contain training and validation data These are created after step 3 of mmt create; the path should be:

runtime/default/tmp/training/preprocessed_corpora/train
runtime/default/tmp/training/preprocessed_corpora/validation

Please use absolute path

nicolabertoldi commented 5 years ago

@chiapas

concerning the java problem

I sincerely suspect that something in your environment, or during the installation went wrong.

Could you please reset everything, clean completely your environment, and try a new installation from scratch following all the installation steps.

I would kindly suggest to use the Docker installation (i.e. the "Option 1 - Using Docker").

ydshieh commented 5 years ago

@nicolabertoldi

Thank you for the help. I will try Docker. For the ENV_MMT_PROBLEM_TRAIN_PATH issue, should I run another command before launch ./mmt create en it examples/data/train --train-steps 6000? Or it should be done automatically, but something went wrong?

nicolabertoldi commented 5 years ago

@chiapas

If you run mmt create all variables are set automatically

If you run directly t2t-datagen or t2t-trainer, you should set the variables manually.

ydshieh commented 5 years ago

@nicolabertoldi

Before trying to use docker, I have one question. In that option, I see

To run your istance and publish the API on port 8045 of your host, execute

nvidia-docker run -it --publish 8045:8045 modernmt/master bash

However, I don't see any tutorial that explains how to, using docker option, to create/start my own engine and use it to translate. Even using docker, all the process in README.md are the same? I have no experience with docker though.

nicolabertoldi commented 5 years ago

@chiapas

the command

nvidia-docker run -it --publish 8045:8045 modernmt/master bash

opens a bash shell, which allows you to do all actions to create and use ModernMT as described in the README.md

nicolabertoldi commented 5 years ago

@chiapas

Follow up.

If you exit the docker bash, the container is stopped. Assuming that the id of your container is CONTAINER_ID, you can start/restart/stop it as follows

sudo nvidia-docker start CONTAINER_ID
sudo nvidia-docker restart CONTAINER_ID
sudo nvidia-docker stop CONTAINER_ID

You can execute your command in a batch way using

sudo nvidia-docker exec CONTAINER_ID  COMMAND

or interactively using

sudo nvidia-docker exec -it CONTAINER_ID bash

The container is persistent. To definitely remove it, you have to run a command like

sudo nvidia-docker rm CONTAINER_ID
nicolabertoldi commented 5 years ago

@chiapas

If you were able to install ModernMT through Docker and use it, we would kindly ask you to close this issue.

Otherwise please let us know how we can help further.

davidecaroselli commented 5 years ago

I'm closing this issue right now for inactivity, but please feel free to re-open the discussion if there are any updates.

Thanks