lucidrains / x-transformers

A simple but complete full-attention transformer with a set of promising experimental features from various papers
MIT License
4.42k stars 377 forks source link

Masking for prepend_embeds #211

Closed zqevans closed 8 months ago

zqevans commented 8 months ago

I'm looking to implement something like VALL-E with phoneme embeddings prepended to the transformer input using prepend_embeds. I would want to mask out padded tokens in this case. Looking at the implementation, it's not clear to me how I would mask out prepended embeddings.

Does a prepend_embeds_mask make sense to add?

Should I be prepending this to the transformer input myself and using the normal mask input to create the attention mask?

lucidrains commented 8 months ago

@zqevans oo, you are the first to use this feature! i moved some logic around, and now you can pass in prepend_mask

zqevans commented 8 months ago

Amazing, thanks!

zqevans commented 8 months ago

Oh, forgot to mention I'm using the ContinuousTransformerWrapper for this as I'm hoping to do it using latent diffusion. Could you implement it there as well?

lucidrains commented 8 months ago

@zqevans yup, you got it! https://github.com/lucidrains/x-transformers/commit/3039bccfcaa96d66f73f739ee1c5e62c612d82b0

zqevans commented 8 months ago

What a legend. Thanks again!

zqevans commented 8 months ago

Getting the error name 'b' is not defined for the continuous one. Looks like the batch size variable is called batch in that version.

lucidrains commented 8 months ago

@zqevans oops, should be good now 🤞