It's only really relevant for small encoders, e.g. with 2 layers, which is usually when it is used as a frontend, e.g. for Conformer. Although this also might happen during pretraining of a larger BLSTM encoder.
In that case, having one pooling of size 6 is usually worse than having two pools of size 2 and 3. Or maybe it depends what comes afterwards, but this is the case when this is used as a frontend for a Conformer. So in this use case, the current default is suboptimal.
It's only really relevant for small encoders, e.g. with 2 layers, which is usually when it is used as a frontend, e.g. for Conformer. Although this also might happen during pretraining of a larger BLSTM encoder.
In that case, having one pooling of size 6 is usually worse than having two pools of size 2 and 3. Or maybe it depends what comes afterwards, but this is the case when this is used as a frontend for a Conformer. So in this use case, the current default is suboptimal.
I'm not really sure what's expected.