Closed Abhaycnvrg closed 1 year ago
Could you try using --do_predict
instead of --do_eval
? With --do_predict
, results are processed and written here: https://github.com/huggingface/optimum-habana/blob/32f8555b543afd696064e8a56979606880f17995/examples/summarization/run_summarization.py#L744
--do_eval
and --do_predict
almost do the same, except that --do_eval
is usually applied on the validation set to check your metrics and --do_predict
is used on the test set to generate the intended results.
Thanks! That worked An unrelated question
What is the difference between a container image that we create while creating an instance and the container image we use in this command here after we connect to the instance using ssh docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.10.0/ubuntu20.04/habanalabs/pytorch-installer-2.0.1:latest
Are they the same...the first is creation while the second is going "inside" it to run programs?
The AMI you used when launching your instance is an image used by AWS to set up the virtual machine running on the hardware you chose. Here is the official AWS doc about AMIs: https://docs.aws.amazon.com/en_us/AWSEC2/latest/UserGuide/AMIs.html
On the other hand, a Docker image contains all the dependencies you need to run your code (because they may not be installed on your AWS instance or on your laptop). And then, you can run a container relying on this image to actually run your code (that's the environment you enter after running docker run ...
). Here is a good summary about Docker images and containers: https://circleci.com/blog/docker-image-vs-container/
System Info
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Start an EC2 instance with DL1 Resource and this image : Habana® Deep Learning Base AMI (Ubuntu 20.04) Run these commands a. docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.10.0/ubuntu20.04/habanalabs/pytorch-installer-2.0.1:latest b. git clone https://github.com/huggingface/optimum-habana.git c. pip install optimum[habana] d. cd examples e. cd summarization f. pip install -r requirements.txt
python run_summarization.py \ --model_name_or_path t5-small \ --do_eval \ --dataset_name cnn_dailymail \ --dataset_config "3.0.0" \ --source_prefix "summarize: " \ --output_dir /tmp/tst-summarization \ --per_device_train_batch_size 4 \ --per_device_eval_batch_size 4 \ --overwrite_output_dir \ --predict_with_generate \ --use_habana \ --use_lazy_mode \ --use_hpu_graphs_for_inference \ --gaudi_config_name Habana/t5 \ --ignore_pad_token_for_loss False \ --pad_to_max_length \ --save_strategy epoch \ --throughput_warmup_steps 3
Expected behavior
Need a file with the summarized text and not just the evaluation metrics