mlcommons / training_results_v1.0

This repository contains the results and code for the MLPerf™ Training v1.0 benchmark.
https://mlcommons.org/en/training-normal-10/
Apache License 2.0
37 stars 43 forks source link

Command is missing the path to the cks directory for the model.ckpt-28252.pt #1

Open JonShelley opened 3 years ago

JonShelley commented 3 years ago

When I run the command

python convert_tf_checkpoint.py --tf_checkpoint /cks/model.ckpt-28252.index --bert_config_path /cks/bert_config.json --output_checkpoint model.ckpt-28252.pt

It works as expected. However, it writes the needed file (model.ckpt-28252.pt) to /workspace/bert. When I exit the container the file is deleted as well. The correct command should be

python convert_tf_checkpoint.py --tf_checkpoint /cks/model.ckpt-28252.index --bert_config_path /cks/bert_config.json --output_checkpoint /cks/model.ckpt-28252.pt

jqueguiner commented 2 years ago

you didn't attach the local director to the container use: docker run -it -v localdiretory:/csk this should to the job

jqueguiner commented 2 years ago

ok I see the command is misleading:

python convert_tf_checkpoint.py --tf_checkpoint /cks/model.ckpt-28252.index --bert_config_path /cks/bert_config.json --output_checkpoint model.ckpt-28252.pt

should be

python convert_tf_checkpoint.py --tf_checkpoint /cks/model.ckpt-28252.index --bert_config_path /cks/bert_config.json --output_checkpoint /cks/model.ckpt-28252.pt
jqueguiner commented 2 years ago

https://github.com/mlcommons/training_results_v1.0/compare/master...jqueguiner:patch-1