huggingface / optimum-neuron

Easy, fast and very cheap training and inference on AWS Trainium and Inferentia chips.
Apache License 2.0
177 stars 53 forks source link

Persistent Cache for `notebooks/text-classification/notebook.ipynb` #115

Open dangdana opened 1 year ago

dangdana commented 1 year ago

Hi,

I'm trying to run notebooks/text-classification/notebook.ipynb . Each time it runs, it recompiles, using a different /tmp/ path each time, e.g. 2023-06-27 17:59:23.000182: INFO ||NCC_WRAPPER||: Compile cache path: /tmp/tmpytdl1fxg

I've never actually completed the compile this way. It may be progressing but very slowly. After 30+ minutes it has made little progress (below). Is this expected?

  0%|         | 1/1500 [00:11<4:41:45, 11.28s/it]2023-06-27 17:59:23.000182
.....................................  ............    ...

To remedy the compile time issue, I tried using Neuron SDK's persistent cache. With neuron_parallel_compile, I see a cache created and populated at e.g. /var/tmp/neuron-compile-cache/neuronxcc-2.7.0.40+f7c6cf2a3. However, for some reason this cache is not leveraged when I run the training. Furthermore, when I rerun the neuron_parallel_compile, 6-8 graphs are always recompiled and added into the cache.

Is it possible to use the persistent cache with the notebook? Or otherwise compile and train in a timely way?

philschmid commented 1 year ago

Did you make any changes to the notebook? Where are you running the notebook? On the Hugging Face AMI?

dangdana commented 1 year ago

I did not make any changes to the notebook. I am running the notebook on the Hugging Face AMI.

Step to reproduce:

  1. Launch a new Hugging Face Ami
  2. Fix up a few things: pip3 install --upgrade ipywidgets # display errors in jupyter pip3 install git+https://github.com/huggingface/accelerate.git#egg=accelerate # re ticket https://github.com/huggingface/optimum-neuron/issues/102
  3. clone optimum-neuron, follow notebook directions.

Here is the output of torchrun after 30+ minutes:

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
is precompilation: None
is precompilation: None
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.bias', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
torch.distributed process group is initialized, but parallel_mode != ParallelMode.DISTRIBUTED. In order to use Torch DDP, launch your script with `python -m torch.distributed.launch
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
torch.distributed process group is initialized, but parallel_mode != ParallelMode.DISTRIBUTED. In order to use Torch DDP, launch your script with `python -m torch.distributed.launch
/usr/local/lib/python3.10/dist-packages/transformers/optimization.py:411: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/transformers/optimization.py:411: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
***** Running training *****
  Num examples = 16,000
  Num Epochs = 3
  Instantaneous batch size per device = 16
  Total train batch size (w. parallel, distributed & accumulation) = 32
  Gradient Accumulation steps = 1
  Total optimization steps = 1,500
  Number of trainable parameters = 109,486,854
  0%|                                                  | 0/1500 [00:00<?, ?it/s]2023-06-28 13:38:23.000086: INFO ||NCC_WRAPPER||: Compile cache path: /tmp/tmp1szb6a7r
.....Selecting 2 allocations
0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Analyzing dependencies of Block1
0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Analyzing dependencies of Block1
0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Dependency reduction of sg0000
0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

Compiler status PASS
No Trainium cache name is saved locally. This means that only the official Trainium cache, and potentially a cache defined in $CUSTOM_CACHE_REPO will be used. You can create a Trainium cache repo by running the following command: `optimum-cli neuron cache create`. If the Trainium cache already exists you can set it by running the following command: `optimum-cli neuron cache set -n [name]`.
No Trainium cache name is saved locally. This means that only the official Trainium cache, and potentially a cache defined in $CUSTOM_CACHE_REPO will be used. You can create a Trainium cache repo by running the following command: `optimum-cli neuron cache create`. If the Trainium cache already exists you can set it by running the following command: `optimum-cli neuron cache set -n [name]`.
You do not have write access to aws-neuron/optimum-neuron-cache so you will not be able to push any cached compilation files. Please log in and/or use a custom Trainium cache.
  0%|                                       | 1/1500 [01:35<39:38:46, 95.21s/it]2023-06-28 13:39:58.000124: INFO ||NCC_WRAPPER||: Compile cache path: /tmp/tmp1szb6a7r
..............Selecting 193207 allocations
0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
*************************************.*******.***.****

^ Could the issue be around access to the aws-neuron/optimum-neuron-cache?

Here are the cached models I see available:

ubuntu@ip-172-31-33-172:~/optimum-neuron/notebooks/text-classification$ optimum-cli neuron cache list aws-neuron/optimum-neuron-cache | grep name
Downloading (…)e/main/registry.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.23k/3.23k [00:00<00:00, 19.5MB/s]
Model name:     Helsinki-NLP/opus-mt-en-ro
Model name:     Helsinki-NLP/opus-mt-en-ro
Model name:     Helsinki-NLP/opus-mt-en-ro
Model name:     hf-internal-testing/tiny-random-gpt_neo
Model name:     hf-internal-testing/tiny-random-gpt_neo
Model name:     hf-internal-testing/tiny-random-gpt_neo
Model name:     hf-internal-testing/tiny-random-gpt_neo
HuggingFaceDocBuilderDev commented 3 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!

HuggingFaceDocBuilderDev commented 2 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!

HuggingFaceDocBuilderDev commented 1 month ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!

HuggingFaceDocBuilderDev commented 3 weeks ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!

HuggingFaceDocBuilderDev commented 2 days ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!