Closed glynpu closed 3 years ago
The scaling is just so that, assuming the input variance is about 1, the variance going into the softmax is about 1. But the difference between the 2 scaling methods only affects the bias parameters (effectively this new way scales down the bias).. it's surprising that it makes so much difference, perhaps this new version does not focus too much on nearby frames.
Maybe there will be more WER difference at worse WERs, e.g. before LM rescoring.
.. don't you have the test-other results?
Results of before rescoring and "test_other" are giving soon(being re-tested.)
Maybe there will be more WER difference at worse WERs, e.g. before LM rescoring.
Relative wer decrease seems no significant difference before and after LM rescoring.
avg epoch 16-20 | no rescore | no rescore | 4-gram lattice rescore | 4-gram lattice rescore |
---|---|---|---|---|
test-clean | test-other | test-clean | test-other | |
before | 4.33 | 8.96 | 3.87 | 8.08 |
current | 4.26 | 8.61 | 3.77 | 7.86 |
relative decrease | 1.62% | 3.91% | 2.58% | 2.72% |
avg epoch 26-30 | no rescore | no rescore | 4-gram lattice rescore | 4-gram lattice rescore |
---|---|---|---|---|
test-clean | test-other | test-clean | test-other | |
before | 4.31 | 8.98 | 3.86 | 8.07 |
current | 4.14 | 8.41 | 3.69 | 7.68 |
relative decrease | 3.94% | 6.35% | 4.40% | 4.83% |
still better though.. good..
On Wednesday, June 2, 2021, LIyong.Guo @.***> wrote:
Maybe there will be more WER difference at worse WERs, e.g. before LM rescoring.
Relative wer decrease seems no significant difference before and after LM rescoring. avg epoch 16-20 no rescore no rescore 4-gram lattice rescore 4-gram lattice rescore test-clean test-other test-clean test-other before 4.33 8.96 3.87 8.08 current 4.26 8.61 3.77 7.86 relative decrease 1.62% 3.91% 2.58% 2.72% avg epoch 26-30 no rescore no rescore 4-gram lattice rescore 4-gram lattice rescore test-clean test-other test-clean test-other before 4.31 8.98 3.86 8.07 current 4.14 8.41 3.69 7.68 relative decrease 3.94% 6.35% 4.40% 4.83%
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/k2-fsa/snowfall/pull/204#issuecomment-853063100, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO4QDSQ3H7TBDTICNCLTQY36DANCNFSM456R7TKQ .
Can you make this an option passed in from the user code, like in your other branch, so that we can more easily decode with "old" models if we need to?
..I'm just concerned it might be disruptive to make this change as-is.
..I'm just concerned it might be disruptive to make this change as-is.
To be compatible to previously trained models, maybe an optional config, e.g. _is_espnetstructure (or another properer name) which default be false could be used. Like these:
def __init__(self, num_features: int, num_classes: int, subsampling_factor: int = 4,
....
is_espnet_structure: bool = False) -> None:
...
self.is_espnet_structure = is_espnet_structure
if self.normalize_before and self.is_espnet_structure:
self.after_norm = nn.LayerNorm(d_model)
Yes. We can change it to True in our current scripts; but it would at least make it possible to revert to False so we can test old models.
Thanks a lot!
Conformer structure differences are identified by loading espnet trained model into snowfall. https://github.com/k2-fsa/snowfall/pull/201
With these two modifications and 30 epoch training, final result is a bit better(3.69 < 3.86 as reported in https://github.com/k2-fsa/snowfall/issues/154) than otherwise.
Could you help verify their effectiveness (maybe they are just training variance)? @zhu-han @pzelasko BTW, is there any mathmatics background which explains when to scaling during attn_output_weights computation? I read several papers, but failed to find a clue about this.
Rescoring WITH 4-gram lm lattice rescore with modifications of this pr
results of 4-gram lattice rescore from #154
Rescoring WITHOUT 4-gram lm lattice rescore with modifications of this pr
results from #154