Can't reproduce training of wav2vec2-large from documentation

HLasse commented 2 years ago

System Info

- `transformers` version: 4.19.0.dev0
- Platform: Linux-5.4.0-109-generic-x86_64-with-glibc2.29
- Python version: 3.8.10
- Huggingface_hub version: 0.5.1
- PyTorch version (GPU?): 1.11.0+cu102 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No

Who can help?

@patrickvonplaten

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

Pretraining a wav2vec-large model using the documentation under examples/speech-pretraining does not work.

Running the following code (copy-pasted from the README) gives an error due to model_path_or_dir not found:

accelerate launch run_wav2vec2_pretraining_no_trainer.py \ 
    --dataset_name=librispeech_asr \
    --dataset_config_names clean clean other \
    --dataset_split_names train.100 train.360 train.500 \
    --output_dir=./test \
    --max_train_steps=200000 \
    --num_warmup_steps=32000 \
    --gradient_accumulation_steps=8 \
    --learning_rate=0.001 \
    --weight_decay=0.01 \
    --max_duration_in_seconds=20.0 \
    --min_duration_in_seconds=2.0 \
    --model_name_or_path=./ 
    --logging_steps=1 \
    --saving_steps=10000 \
    --per_device_train_batch_size=2 \
    --per_device_eval_batch_size=4 \
    --adam_beta1=0.9 \
    --adam_beta2=0.98 \
    --adam_epsilon=1e-06 \
    --gradient_checkpointing \

I tried using ´facebook/wav2vec-large-lv60' in model_name_or_path but receive the following error:

Traceback (most recent call last):
  File "run_wav2vec2_pretraining_no_trainer.py", line 730, in <module>
    main()
  File "run_wav2vec2_pretraining_no_trainer.py", line 572, in main
    for step, batch in enumerate(train_dataloader):
  File "/home/ucloud/.local/lib/python3.8/site-packages/accelerate/data_loader.py", line 303, in __iter__    for batch in super().__iter__():  File "/home/ucloud/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 530, in __next__    data = self._next_data()
  File "/home/ucloud/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 570, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/ucloud/.local/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
    return self.collate_fn(data)
  File "run_wav2vec2_pretraining_no_trainer.py", line 326, in __call__
    sampled_negative_indices = _sample_negative_indices(
  File "/home/ucloud/.local/lib/python3.8/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py", line 336, in _sample_negative_indices
    sampled_indices = np.random.randint(0, high, size=(high + 1, num_negatives))
  File "mtrand.pyx", line 748, in numpy.random.mtrand.RandomState.randint
  File "_bounded_integers.pyx", line 1247, in numpy.random._bounded_integers._rand_int64
ValueError: high <= 0

The demo script trains without issue. Using the parameters from the demo script and changing model_name_or_path from 'patrickvonplaten/wav2vec2-base-v2` to ´facebook/wav2vec-large-lv60´ gives the above error.

Training on a single T4 GPU (benchmarking purposes)

Expected behavior

Wav2vec-large pretraining to run.

patrickvonplaten commented 2 years ago

Hey @HLasse,

Could you increase this parameter: https://huggingface.co/facebook/wav2vec2-large-lv60/blob/main/config.json#L62 to 0.5 and see if it works then? It seems like given the sequence length you are not sampling enough negative targets.

Also it'll be really hard / impossible to do a full pretraining on a single T4 GPU

HLasse commented 2 years ago

That works, thanks!

Also it'll be really hard / impossible to do a full pretraining on a single T4 GPU

I know - this was mainly to get an estimate of training time on different hardware setups. Danish wav2vec models coming up soon! :)

jieunpark1 commented 1 year ago

Hi. I encountered exactly the same issue. I'm using the Wav2Vec2ConformerForPreTraining model 'facebook/wav2vec2-conformer-rope-large', training on a single NVIDIA TITAN Xp with a very small speech dataset(pilot).

I've already changed the mask_time_prob, but it didn't work for me. The error message I got was the same one above.

Could you guys help me with this problem?? Thank you in advance!!

huggingface / transformers