Open dangdana opened 1 year ago
Did you make any changes to the notebook? Where are you running the notebook? On the Hugging Face AMI?
I did not make any changes to the notebook. I am running the notebook on the Hugging Face AMI.
Step to reproduce:
pip3 install --upgrade ipywidgets # display errors in jupyter
pip3 install git+https://github.com/huggingface/accelerate.git#egg=accelerate # re ticket https://github.com/huggingface/optimum-neuron/issues/102
Here is the output of torchrun after 30+ minutes:
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
is precompilation: None
is precompilation: None
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.bias', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
torch.distributed process group is initialized, but parallel_mode != ParallelMode.DISTRIBUTED. In order to use Torch DDP, launch your script with `python -m torch.distributed.launch
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
torch.distributed process group is initialized, but parallel_mode != ParallelMode.DISTRIBUTED. In order to use Torch DDP, launch your script with `python -m torch.distributed.launch
/usr/local/lib/python3.10/dist-packages/transformers/optimization.py:411: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
warnings.warn(
/usr/local/lib/python3.10/dist-packages/transformers/optimization.py:411: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
warnings.warn(
***** Running training *****
Num examples = 16,000
Num Epochs = 3
Instantaneous batch size per device = 16
Total train batch size (w. parallel, distributed & accumulation) = 32
Gradient Accumulation steps = 1
Total optimization steps = 1,500
Number of trainable parameters = 109,486,854
0%| | 0/1500 [00:00<?, ?it/s]2023-06-28 13:38:23.000086: INFO ||NCC_WRAPPER||: Compile cache path: /tmp/tmp1szb6a7r
.....Selecting 2 allocations
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Analyzing dependencies of Block1
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Analyzing dependencies of Block1
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Dependency reduction of sg0000
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Compiler status PASS
No Trainium cache name is saved locally. This means that only the official Trainium cache, and potentially a cache defined in $CUSTOM_CACHE_REPO will be used. You can create a Trainium cache repo by running the following command: `optimum-cli neuron cache create`. If the Trainium cache already exists you can set it by running the following command: `optimum-cli neuron cache set -n [name]`.
No Trainium cache name is saved locally. This means that only the official Trainium cache, and potentially a cache defined in $CUSTOM_CACHE_REPO will be used. You can create a Trainium cache repo by running the following command: `optimum-cli neuron cache create`. If the Trainium cache already exists you can set it by running the following command: `optimum-cli neuron cache set -n [name]`.
You do not have write access to aws-neuron/optimum-neuron-cache so you will not be able to push any cached compilation files. Please log in and/or use a custom Trainium cache.
0%| | 1/1500 [01:35<39:38:46, 95.21s/it]2023-06-28 13:39:58.000124: INFO ||NCC_WRAPPER||: Compile cache path: /tmp/tmp1szb6a7r
..............Selecting 193207 allocations
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
*************************************.*******.***.****
^ Could the issue be around access to the aws-neuron/optimum-neuron-cache?
Here are the cached models I see available:
ubuntu@ip-172-31-33-172:~/optimum-neuron/notebooks/text-classification$ optimum-cli neuron cache list aws-neuron/optimum-neuron-cache | grep name
Downloading (…)e/main/registry.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.23k/3.23k [00:00<00:00, 19.5MB/s]
Model name: Helsinki-NLP/opus-mt-en-ro
Model name: Helsinki-NLP/opus-mt-en-ro
Model name: Helsinki-NLP/opus-mt-en-ro
Model name: hf-internal-testing/tiny-random-gpt_neo
Model name: hf-internal-testing/tiny-random-gpt_neo
Model name: hf-internal-testing/tiny-random-gpt_neo
Model name: hf-internal-testing/tiny-random-gpt_neo
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!
Hi,
I'm trying to run
notebooks/text-classification/notebook.ipynb
. Each time it runs, it recompiles, using a different/tmp/
path each time, e.g.2023-06-27 17:59:23.000182: INFO ||NCC_WRAPPER||: Compile cache path: /tmp/tmpytdl1fxg
I've never actually completed the compile this way. It may be progressing but very slowly. After 30+ minutes it has made little progress (below). Is this expected?
To remedy the compile time issue, I tried using Neuron SDK's persistent cache. With
neuron_parallel_compile
, I see a cache created and populated at e.g./var/tmp/neuron-compile-cache/neuronxcc-2.7.0.40+f7c6cf2a3
. However, for some reason this cache is not leveraged when I run the training. Furthermore, when I rerun theneuron_parallel_compile
, 6-8 graphs are always recompiled and added into the cache.Is it possible to use the persistent cache with the notebook? Or otherwise compile and train in a timely way?