This causes problems if we load a model that used a different pad token. For example, PereLluis13/Wav2Vec2-Large-XLSR-53-catalan uses "\<PAD>" instead. In this case, then the prediction will always be "0".
This can be fixed in 1 line by doing the comparison against processor.tokenizer.pad_token.
The pad token for the wav2vec hybrid segmentation method is hardcoded to the token "\<pad>".
https://github.com/mt-upc/SHAS/blob/a64a70f8571f7b154dadf205203a04d151448d5b/src/segmentation_methods/utils.py#L333
This causes problems if we load a model that used a different pad token. For example, PereLluis13/Wav2Vec2-Large-XLSR-53-catalan uses "\<PAD>" instead. In this case, then the prediction will always be "0".
This can be fixed in 1 line by doing the comparison against processor.tokenizer.pad_token.
Before the fix:
After the fix:
Fixed in #3