Implements two front-ends for acoustic encoders

rwth-i6 / i6_models

Collection of NN-Model parts

Mozilla Public License 2.0

1 stars 0 forks source link

Implements two front-ends for acoustic encoders #17

Closed christophmluscher closed 1 year ago

christophmluscher commented 1 year ago

This is not specific for the Conformer so maybe parts/conformer is not the right place?

Atticus1806 commented 1 year ago

Some tests regarding the downsampling shapes (similar to #4) would be nice :)

Judyxujj commented 1 year ago

I have two questions here,

the param stride should be added to the Config, since usually we set stride in Conv2d to apply the time subsampling
after the front end convolution models, we need to apply one additional linear layer to project the dimension back to the model size. Should the linear layer be included in the front end, or should it be included in the ConformerBlock? I would prefer the former

christophmluscher commented 1 year ago

I have two questions here,

the param stride should be added to the Config, since usually we set stride in Conv2d to apply the time subsampling

already added but not pushed yet :)

after the front end convolution models, we need to apply one additional linear layer to project the dimension back to the model size. Should the linear layer be included in the front end, or should it be included in the ConformerBlock? I would prefer the former

I was thinking about putting this as an option into the frontend: with param int N -> add linear layer with N outputs, None -> no linear layer

@Atticus1806

christophmluscher commented 1 year ago

tests fail due to the seq mask update missing. I am not 100% sure how to perform the update in a clean fashion. Is there a PyTorch way to do this?

albertz commented 1 year ago

tests fail due to the seq mask update missing. I am not 100% sure how to perform the update in a clean fashion. Is there a PyTorch way to do this?

I'm not exactly sure what you refer to. What update do you mean? You mean how to compute the seq mask which is supposed to be returned by the frontend? As I mentioned, you could apply maxpooling with the same striding and kernel size just as the other operations.

christophmluscher commented 1 year ago

you could apply maxpooling with the same striding and kernel size just as the other operations.

this would also work for conv layers?

christophmluscher commented 1 year ago

I will move the other frontend to another PR.

@Atticus1806 tested this within the conformer so should be ok now

everything else I would like to move to follow up PRs so that we can merge and start using this

christophmluscher commented 1 year ago

@JackTemaki @albertz

albertz commented 1 year ago

I would change the code to only support tuples instead of both tuples and integer, this makes the code more readable and more consistent.

I'm not sure I agree. PyTorch itself also supports both. And the more common use case it that the user provides just a single int.

michelwi commented 1 year ago

I have two questions here,

the param stride should be added to the Config, since usually we set stride in Conv2d to apply the time subsampling

already added but not pushed yet :)

Was this added in the final version of the PR, I don't seem to find it..