Closed xwuShirley closed 3 years ago
@xwuShirley , Are you training a wav2vec, vq-wav2vec or vq-wav2vec2 model? In case you are training the first two, can you try this fork.
If you are using wav2vec2 we can point you to a different fork in progress.
we are training on wav2vec2. But we decided to use A100. Thanks!
Hi @ultrons, Could you please point me to a wav2vec2 fork that works on TPUs?
Thanks :)
@awasthiabhijeet , The master branch works. It has instruction to run as in the examples/wav2vec README Use config file something like this:
export XRT_TPU_CONFIG="localservice;0;localhost:51011"
OMP_NUM_THREADS=1 fairseq-hydra-train task.data=/home/sivaibhav/manifest --config-dir ./examples/wav2vec/config/pretraining --config-name wav2vec2_large_librivox_tpu.yaml
With one modification: add batch_size = 4 in the dataset section. Let me know if you have any issues.
Hi @ultrons ,
Does it also support supervised fine-tuning using the CTC-loss?
README in examples/wav2vec mentions that "Wav2Vec2 is now supported on TPUs! It's currently pre-training only."
I am looking for a code that allows me to finetune wav2vec2 on TPUs using CTC-loss. CTC-loss provided by PyTorch is currently not lowered in pytorch/xla. (https://github.com/pytorch/xla/issues/2399)
Currently W2V2 on tpu is only used for pre-training CTC loss is in the fine tuning code which has not been optimized for TPUs yet.
🐛 Bug
I try to run the fairseq code for wav2vec example using TPU V8 https://github.com/pytorch/fairseq/tree/master/examples/wav2vec
BUt got the following error: No gradient backward
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [XLAFloatType [216, 1024, 1, 48]$, which is output 0 of UnsqueezeBackward0, is at version 5; expected version 4 instead. Hint: enable anomaly detection to find the operation t$at failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
2020-12-11 02:31:38 | INFO | fairseq_cli.train | task: AudioPretrainingTask
2020-12-11 02:31:38 | INFO | fairseq_cli.train | model: Wav2VecCtc 2020-12-11 02:31:38 | INFO | fairseq_cli.train | criterion: CtcCriterion)
2020-12-11 02:31:38 | INFO | fairseq_cli.train | num. model params: 315471520 (num. trained: 315471520)
2020-12-11 02:31:43 | INFO | fairseq_cli.train | training on 8 devices (GPUs/TPUs)
2020-12-11 02:31:43 | INFO | fairseq_cli.train | max tokens per GPU = 3400000 and batch size per GPU = None
2020-12-11 02:31:43 | INFO | fairseq.trainer | no existing checkpoint found checkpoints/checkpoint_last.pt
2020-12-11 02:31:43 | INFO | fairseq.trainer | loading train data for epoch 1
2020-12-11 02:31:43 | INFO | fairseq.data.audio.raw_audio_dataset | loaded 33139, skipped 0 samples
2020-12-11 02:31:44 | INFO | fairseq.trainer | begin training epoch 1
Exception in device=TPU:5: one of the variables needed for gradient computation has been modified by an inplace operation: [XLAFloatType [216, 1024, 1, 48]], which is output 0 of UnsqueezeBackward0, is at version 5; expected version 4 instead. Hint: enable anomaly detection to find th$ operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
Traceback (most recent call last): File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 329, in _mp_start_fn
_start_fn(index, pf_cfg, fn, args) File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 323, in _start_fn
fn(gindex, *args) File "/home/user/fairseq/fairseq/distributed_utils.py", line 302, in distributed_main
main(cfg, kwargs) File "/home/user/fairseq/fairseq_cli/train.py", line 138, in main
valid_losses, should_stop = train(cfg, trainer, task, epoch_itr) File "/root/anaconda3/envs/pytorch/lib/python3.6/contextlib.py", line 52, in inner
return func(*args, *kwds) File "/home/user/fairseq/fairseq_cli/train.py", line 227, in train
log_output = trainer.train_step(samples) File "/root/anaconda3/envs/pytorch/lib/python3.6/contextlib.py", line 52, in inner
return func(args, kwds) File "/home/user/fairseq/fairseq/trainer.py", line 562, in train_step
raise e File "/home/user/fairseq/fairseq/trainer.py", line 536, in train_step
ignore_grad=is_dummy_batch, File "/home/user/fairseq/fairseq/tasks/fairseq_task.py", line 432, in train_step
optimizer.backward(loss) File "/home/user/fairseq/fairseq/optim/fairseq_optimizer.py", line 95, in backward
loss.backward() File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/tensor.py", line 233, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/autograd/init.py", line 146, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [XLAFloatType [216, 1024, 1, 48]$, which is output 0 of UnsqueezeBackward0, is at version 5; expected version 4 instead. Hint: enable anomaly detection to find the operation t$at failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
Traceback (most recent call last): File "/home/user/fairseq/fairseq_cli/hydra_train.py", line 38, in hydra_main
distributed_utils.call_main(cfg, pre_main) File "/home/user/fairseq/fairseq/distributed_utils.py", line 332, in call_main
nprocs=8, # use all 8 TPU cores File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 394, in spawn start_method=start_method) File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 205, in start_processes while not context.join(): File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 160, in join exit_code=exitcode torch.multiprocessing.spawn.ProcessExitedException: process 5 terminated with exit code 17
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
To Reproduce
You need two additional files to produce the above code: the data and the configuration/the yaml. I put it to this link https://gitlab.com/xwuShirley/fairseq-tpu/-/blob/master
==>the data is sample.zip
after unzip the file, please go to find the file train.tsv and valid.tsv to update the directory of the data (the first line)
==> the yaml is my_base.yaml (this one is originally from https://github.com/pytorch/fairseq/blob/master/examples/wav2vec/config/finetuning/base_100h.yaml I updated with line https://gitlab.com/xwuShirley/fairseq-tpu/-/blob/master/my_base.yaml#L7)
Thank you very much for your help in advance. I was told V3-8 TPU runs much faster than GPU 8XV100 and so decided to give it a try. I am not sure if this is due to the CTC criterion(https://gitlab.com/xwuShirley/fairseq-tpu/-/blob/master/my_base.yaml#L30) since it's not a standard loss.
Best, Shirley