pyannote / pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
http://pyannote.github.io
MIT License
6.43k stars 792 forks source link

Wavlm modules are always in `eval` mode when training `ToTaToNet` and `SSeRiouSS` models #1793

Open clement-pages opened 5 days ago

clement-pages commented 5 days ago

Tested versions

System information

Found on Jean-Zay

Issue description

According to model summarize (displayed when staring a training), wavlm modules are always in eval mode, whether the wavlm is frozen or not. This is the case for:

image

image

I didn't find any .eval() in the models code or in my training scripts, so maybe we have to manually set wavlm mode to train...

Minimal reproduction example (MRE)

https://colab.research.google.com/drive/15JslGMUrSeMOhmxH_N1OM_QPi42aNoXB?usp=sharing

hbredin commented 3 days ago

I don't have this mode column in my own logs. Can you please provide a fully reproducible example?

  | Name              | Type             | Params | In sizes      | Out sizes
---------------------------------------------------------------------------------------------------------------------
0 | wav2vec           | Wav2Vec2Model    | 94.4 M | ?             | ?
1 | lstm              | LSTM             | 2.1 M  | [1, 999, 768] | [[1, 999, 256], [[8, 1, 128], [8, 1, 128]]]
2 | linear            | ModuleList       | 49.4 K | ?             | ?
3 | classifier        | Linear           | 903    | [1, 999, 128] | [1, 999, 7]
4 | activation        | LogSoftmax       | 0      | [1, 999, 7]   | [1, 999, 7]
5 | powerset          | Powerset         | 0      | ?             | ?
6 | validation_metric | MetricCollection | 0      | ?             | ?
  | other params      | n/a              | 12     | n/a           | n/a
---------------------------------------------------------------------------------------------------------------------
clement-pages commented 3 days ago

I just updated my first message with a notebook containing a reproducible example.