huggingface / optimum-neuron

Easy, fast and very cheap training and inference on AWS Trainium and Inferentia chips.
Apache License 2.0
179 stars 55 forks source link

Optimum Neuron Bug Bash - fine_tune_bert - huggingface/tokenizers warning #425

Open jsamuel1 opened 6 months ago

jsamuel1 commented 6 months ago

Tutorial: fine_tune_bert

Setup: Loading the notebook at https://raw.githubusercontent.com/huggingface/optimum-neuron/main/notebooks/text-classification/notebook.ipynb

Executing through the notebook, when arriving at the steps [4] !wget ...train.py and [5] !torchrun Both steps give a red warning:

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

Expected - That a previous step should explain about this and set the environment variable to an appropriate value (False?)

Environment Setup and logs:

dpkg --list|grep neuron
ii  aws-neuronx-collectives            2.18.19.0-f7a1f7a35                   amd64        neuron_ccom built using CMake
ii  aws-neuronx-dkms                   2.14.5.0                              amd64        aws-neuronx driver in DKMS format.
ii  aws-neuronx-oci-hook               2.2.27.0                              amd64        neuron_oci_hook built using CMake
ii  aws-neuronx-runtime-lib            2.18.15.0-d9ebf86cc                   amd64        neuron_runtime built using CMake
ii  aws-neuronx-tools                  2.15.4.0                              amd64        Neuron profile and debug tools
ubuntu@ip-172-31-45-188:~$ pip list | grep 'neuron\|torch'
/usr/bin/pip:6: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
  from pkg_resources import load_entry_point
aws-neuronx-runtime-discovery 2.9
libneuronxla                  0.5.570
neuronx-cc                    2.11.0.34+c5231f848
neuronx-distributed           0.5.0
neuronx-hwm                   2.11.0.2+e34678757
optimum-neuron                0.0.16
tensorboard-plugin-neuronx    2.5.43.0
torch                         1.13.1
torch-neuronx                 1.13.1.1.12.1
torch-xla                     1.13.1+torchneuronc
torchvision                   0.14.1
transformers-neuronx          0.8.268
philschmid commented 6 months ago

Please see https://github.com/huggingface/transformers/issues/5486 or https://stackoverflow.com/questions/62691279/how-to-disable-tokenizers-parallelism-true-false-warning

HuggingFaceDocBuilderDev commented 3 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!

HuggingFaceDocBuilderDev commented 2 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!

HuggingFaceDocBuilderDev commented 1 month ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!

HuggingFaceDocBuilderDev commented 3 weeks ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!