Identify some possible implementation issues

tangYang7 / fluency_scorer

It's unofficial implementation for speech fluency assessment model

9 stars 5 forks source link

First of all, thank you for making efforts to implement this paper. Regarding the unsatifised experimental results, some possible implementation issues are identified to cause this performance gap.

The output dimension of the preprocessing layer should not be equal to the input dimension. See https://github.com/tangYang7/fluency_scorer/blob/main/models/fluScorer.py#L108 Instead, this layer should project the pretrained wav2vec2 features to a much lower hidden dimension, i.e., 32. And the dimension of the BLSTM should be modified accordingly.
The mask is not implemented as desired, e.g., there is no mask passing into the pooling function at https://github.com/tangYang7/fluency_scorer/blob/main/models/fluScorer.py#L56.
The predicted score and label score seem to be not well normalized. The raw fluency label score ranges in [0, 10], not clear why multiply it with 0.2 at https://github.com/tangYang7/fluency_scorer/blob/main/train.py#L313.

Hope these would help you obtain a satifised result.

tangYang7 / fluency_scorer

Identify some possible implementation issues #4