Troubles adapting SURT AMI model to my data

kfmn commented 9 months ago

Hi, I am trying to use SURT AMI recipe to adapt pre-trained model (SURT_BASE) to my data. I have prepared dataset similarly to AMI & ISCI datasets preparations, CutSet contains only overlapped speech segments and corresponding supervisions.

I tried to run training with train_adapt.py but it failed with a message about absence of heat_loss_scale in params at https://github.com/k2-fsa/icefall/blob/1c30847947f9d8b1416ef3e70408c07eab807f3d/egs/ami/SURT/dprnn_zipformer/train_adapt.py#L867

I have discovered that such a parameters is present in train.py script and added it into train_adapt.py similarly. But another error appears due to there is no field "source_feats" in batch at https://github.com/k2-fsa/icefall/blob/1c30847947f9d8b1416ef3e70408c07eab807f3d/egs/ami/SURT/dprnn_zipformer/train_adapt.py#L768

At the beginning of compute_heat_loss function in the description of batch there is a reference to lhotse.dataset.K2SurtDatasetWithSources() but I couldn't find such a class in actual lhotse version.

In order to exclude computing heat_loss I had to set heat_loss_scale to 0.0. After this training has started, and loss began to decrease. But after several epochs I decided to check performance on dev set and found that all ASR results are empty, WER is 100%. With the starting model it was about 90%, something was still recognized.

Please help me with my task and explain the stuff about heat loss and corresponding pieces of code.

JinZr commented 9 months ago

@desh2608 you might want to look into this issue?

seems like K2SurtDatasetWithSources does not exist in the lhotse repo.

desh2608 commented 9 months ago

Sorry that is an old comment and should be changed. If you have prepared sources correctly, there is a return_sources option in the K2SurtDataset which can be used. Please read the docstring for this class where we describe how to add sources to the cut sets.

Even without this masking loss, it should still be possible to train the model only using the transducer loss, but setting the --heat-loss-scale to 0.0. If it is giving you empty outputs on decoding, I would suggest checking the training references to make sure that you are indeed training it correctly.

kfmn commented 9 months ago

Hi, @desh2608. I have double-checked my cuts and find them fully correct. So the reason is somewhere else.

My data is a subset of CHiME-6 single distant microphone recordings with a reference transcripts. The data preparation followed that of AMI, I created recordings and supervisions files then joined them into cutsets, computed features and splitted cuts into shorter than 30s units. Maybe this data is unsutable for SURT for some reason, what do you think about it?

If you have a positive expirience with SURT on CHiME-6 data, please share your prepared data or data prepartion scripts as well as results you managed to obtain.

Thanks.

desh2608 commented 9 months ago

I haven't tried SURT for CHiME-6, but in my experience, it is hard to get single-channel end-to-end models to work well on CHiME-6 anyway since the recording devices are in multiple rooms and the conversation may be happening in a different room. This is why in all the challenges so far, there were no single-channel tracks.

I would suggest to first create an IHM-Mix style version by digitally mixing the close-talk recordings of the participants, and try to train a model for that data.

kfmn commented 9 months ago

Thanks for you answer, I will try to test the advised variant.

But what is still strange, I run decoding on the same dataset as used as validation dataset in training. And validation loss during training decreases significantly but this doesn't reflect on WER.

And, BTW, what do you think about possibility to train SURT model on multi-channel audio? This could leverage knowledge from different channels to infer better masks which could be used for beamforming before feeding to transducer.

kfmn commented 9 months ago

And one more question: is there a possibility to extract raw masks or masked spectra from separation model ?

desh2608 commented 9 months ago

Improvement in validation loss with WER collapse usually indicates that the training references are not correct --- for instance, all empty references. You can print out a batch and see what the input and references are, just to make sure.

Regarding multi-channel input: yes, that is a possible direction. I didn't try it because (i) it increases the feature size significantly and you would run into GPU memory constraints, and (ii) the model becomes dependent on the array configuration.

Do you mean to extract masks from an external separation model? I didn't try this, so can't say how well it would work, but it should be possible to initialize the masking network this way.

kfmn commented 9 months ago

No, I meant the internal SURT separation model. Thanks for your idea on printing batch, will check it too

desh2608 commented 9 months ago

You can save the masks using the --save-masks option during decoding. However, I found it difficult to interpret them unfortunately.

kfmn commented 9 months ago

Hi @desh2608, I have tried various variants but the result is always the same: low loss and 100% WER. Batches look absolutely normal, reference transcript correspond to audio etc.

So I decided to repeat your adaptation recipe. And I ran train_adapt.py with ami/icsi training cuts prepared with a recipe scripts and decoded ami/icsi dev/test sets. And I have found that the results are exactly the same: 100% WER, only deletions in ASR hypos.

So I assume, that something is wrong in a recipe or data preparation pipeline, which makes training to work differently from your setup, which provided good models.

I will re-check the latest version of the recipe to ensure the trouble is present.

desh2608 commented 9 months ago

Thanks. It would be great if you could reproduce the results from the recipe, or help debug if you can't.

kfmn commented 9 months ago

So,

I have re-generated all the data according to the latest version of the recipe and lhotse. Since I have faced multiple troubles I decided to list them all in order to fix them soon:

Data preparation recipe has very weak protection against different faults. Since any stage passing is checked only from some folder existence, any failure inside a stage requires to remove this folder before re-running. It would be much better to have Kaldi-style checking via lhotse return codes analysis and creating file like .done...
Line 67 of prepare.sh: lhotse download rirs_noises $dl_dir should look as lhotse download rir-noise $dl_dir instead. Probably lhotse download command changed since recipe release... Besides, it unpacks corpus into downloads/RIRS_NOISES while following recipe steps assume downloads/rirs_noises instead.
On the stage 4 where features are computed it is assumed that the folder data/fbank exists. But it is not created explicitly so the stage initially fails.
On the stage 6 the file data/manifests/ai-mix_cuts_clean.jsonl.gz is created. But 2 lines after the script local/compute_fbank_aimix.py is called which ecpects to load data/manifests/ai-mix_cuts_clean_full.jsonl.gz instead

With these errors fixed all data preparation stages were passed successfully. Moreover, I have found that some of these stages were not passed in my previous data preparation attempt (in particular, the stage of adding sources to the cuts, which took a lot of time, several days!), so I hoped now adaptation will behave differently.

I put surt_base model from HuggingFace into dprnn_zipformer/exp with the name epoch-30.pt and attempted to run python ./dprnn_zipformer/train_adapt.py \ --world-size 1 \ --num-epochs 30 \ --start-epoch 1 \ --use-fp16 1 \ --exp-dir dprnn_zipformer/exp_adapt \ --model-init-ckpt dprnn_zipformer/exp/epoch-30.pt \ --enable-musan False \ --max-duration 550 This run failed immediately due to absence of the attribute heat_loss_scale (I have already written about this above). I added the following code into train_adapt.py: parser.add_argument( "--heat-loss-scale", type=float, default=0.0, help="Scale for HEAT loss on separated sources.", ) I had to set default to 0.0 since otherwise the code fails at the line source_feats = batch["source_feats"] since there are no source feats in data/manifests/cuts_train_ami_icsi.jsonl.gz which is loaded as train_cuts. After these changes I have managed to start training and stopped it after 4 epochs. During this period the validation loss decreased from 2.286 before training to 0.768 ( to 1.031 for the first epoch )
I decided to test the model after the first epoch. First of all, the decode.py file contains line args.lang_dir = Path(args.lang_dir) but lang_dir parameter is not passed via cmdline in the examples at the beginning of file.
Besides, decode.py tries to load dev and test cuts for ICSI but no such files are found in data/manifests (for example, data/manifests/cuts_icsi-ihm-mix_dev.jsonl.gz is not found (only train cuts for ICSI are created), so I had to comment these lines and limit testing by AMI data only.
On the call of decode_one_batch function the argument "batch" is expected but not passed. I have to add batch=batch to the call
And finally, when I managed to run decode.py as follows: python dprnn_zipformer/decode.py \ --epoch 1 --avg 1 --use-averaged-model False \ --exp-dir dprnn_zipformer/exp_adapt \ --max-duration 250 \ --decoding-method modified_beam_search \ --beam-size 4 I have observed 100% WER with deletions only, alas.

I hope this information will be useful to find the roots of trouble and to fix recipe.

desh2608 commented 9 months ago

Phew! Thanks for going through the recipe in such detail. I think several of these issues were caused because I ran commands one at a time over the course of several weeks, and didn't actually run the full recipe from start to end. That was obviously a mistake!

If it's not very urgent, I will run the recipe again this weekend and fix the issues.

On Wed, Feb 7, 2024 at 10:14 AM kfmn @.***> wrote:

So,

I have re-generated all the data according to the latest version of the recipe and lhotse. Since I have faced multiple troubles I decided to list them all in order to fix them soon:

1.

Data preparation recipe has very weak protection against different faults. Since any stage passing is checked only from some folder existence, any failure inside a stage requires to remove this folder before re-running. It would be much better to have Kaldi-style checking via lhotse return codes analysis and creating file like .done... 2.

Line 67 of prepare.sh: lhotse download rirs_noises $dl_dir should look as lhotse download rir-noise $dl_dir instead. Probably lhotse download command changed since recipe release... Besides, it unpacks corpus into downloads/RIRS_NOISES while following recipe steps assume downloads/rirs_noises instead. 3.

On the stage 4 where features are computed it is assumed that the folder data/fbank exists. But it is not created explicitly so the stage initially fails. 4.

On the stage 6 the file data/manifests/ai-mix_cuts_clean.jsonl.gz is created. But 2 lines after the script local/compute_fbank_aimix.py is called which ecpects to load data/manifests/ai-mix_cuts_clean_full.jsonl.gz instead

With these errors fixed all data preparation stages were passed successfully. Moreover, I have found that some of these stages were not passed in my previous data preparation attempt (in particular, the stage of adding sources to the cuts, which took a lot of time, several days!), so I hoped now adaptation will behave differently.

1.

I put surt_base model from HuggingFace into dprnn_zipformer/exp with the name epoch-30.pt and attempted to run python ./dprnn_zipformer/train_adapt.py \ --world-size 1 \ --num-epochs 30 \ --start-epoch 1 \ --use-fp16 1 \ --exp-dir dprnn_zipformer/exp_adapt \ --model-init-ckpt dprnn_zipformer/exp/ epoch-30.pt \ --enable-musan False \ --max-duration 550 This run failed immediately due to absence of the attribute heat_loss_scale (I have already written about this above). I added the following code into train_adapt.py: parser.add_argument( "--heat-loss-scale", type=float, default=0.0, help="Scale for HEAT loss on separated sources.", ) I had to set default to 0.0 since otherwise the code fails at the line source_feats = batch["source_feats"] since there are no source feats in data/manifests/cuts_train_ami_icsi.jsonl.gz which is loaded as train_cuts. After these changes I have managed to start training and stopped it after 4 epochs. During this period the validation loss decreased from 2.286 before training to 0.768 ( to 1.031 for the first epoch ) 2.

I decided to test the model after the first epoch. First of all, the decode.py file contains line args.lang_dir = Path(args.lang_dir) but lang_dir parameter is not passed via cmdline in the examples at the beginning of file. 3.

Besides, decode.py tries to load dev and test cuts for ICSI but no such files are found in data/manifests (for example, data/manifests/cuts_icsi-ihm-mix_dev.jsonl.gz is not found (only train cuts for ICSI are created), so I had to comment these lines and limit testing by AMI data only. 4.

On the call of decode_one_batch function the argument "batch" is expected but not passed. I have to add batch=batch to the call 5.

And finally, when I managed to run decode.py as follows: python dprnn_zipformer/decode.py \ --epoch 1 --avg 1 --use-averaged-model False \ --exp-dir dprnn_zipformer/exp_adapt \ --max-duration 250 \ --decoding-method modified_beam_search \ --beam-size 4 I have observed 100% WER with deletions only, alas.

I hope this information will be useful to find the roots of trouble and to fix recipe.

— Reply to this email directly, view it on GitHub https://github.com/k2-fsa/icefall/issues/1477#issuecomment-1932258071, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNDOBKUC5P6UG3YLGIRP2TYSOK33AVCNFSM6AAAAABCMOP2VGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZSGI2TQMBXGE . You are receiving this because you were mentioned.Message ID: @.***>

k2-fsa / icefall

Troubles adapting SURT AMI model to my data #1477