lucidrains / BS-RoFormer

Implementation of Band Split Roformer, SOTA Attention network for music source separation out of ByteDance AI Labs
MIT License
384 stars 13 forks source link

Hidden dim in mask estimation module #12

Closed Psarpei closed 10 months ago

Psarpei commented 10 months ago

In the paper second sentence in 4.4 configuration they mention:

The multi-band mask estimation module utilizes MLPs with a hidden layer dimension of 4D.

But in the actual code from my understanding there is no hidden layer only 2 linear layers smart combined to a single one.

lucidrains commented 10 months ago

@Psarpei ah yea, good catch

i've added the hidden layer with the expansion factor of 4

Psarpei commented 10 months ago

Nice thanks :)

One last thing, they mention also that they are using a dim size of 384 that would result in a head_dim of 48 for 8 heads. In the actual code the head_dim is 64, If the Implementation should allign the one from the paper one could also consider to change the head_dim to 48

lucidrains commented 10 months ago

@Psarpei yeah, you can simply set dim_head = 48 here 64 would not hurt

lucidrains commented 10 months ago

@Psarpei the model dim size is actually independent of attention head dimension, if transformers is implemented correctly

Psarpei commented 10 months ago

Okay thanks :)