Open sonovice opened 1 year ago
@sonovice hey Simon :wave:
you are seeing success with axial rotary embeddings, i'm guessing on mel spec?
that's a bit of a personal invention that i haven't broadcasted that much
i can think about integrating it if you share what your experimental results look like
@lucidrains Hey Phil and thanks for the fast response.
Actually, I didn't have any kind of spectral features in mind (though you just triggered an entire world of new ideas :wink: )
What I would like to try is to recreate something like LayoutLM for musical scores with meaningful 2d relative positional embeddings to capture the relations between musical glyphs in a score page. Your axial rotary embeddings seem like a perfect fit.
EDIT: LayoutLM in a nut shell would be: Take detected (and classified) objects from a text document image, add learned embeddings for x, y, w and h and use these embeddings to do things like paragraph classification etc. with it.
@lucidrains I finally found some time to look at this again. Would you be open to a pull request against x-transformers if I manage to introduce this?
@sonovice I'm looking into doing something similar (but different domain). Can I ask if you succeeded in trying 4d rotary embeddings?
Is it possibly to easily use axial rotary embeddings with your x-transformers without having to disect the Attention module? At first glance it seems that there is no simple way to just pass an instance of
RotaryEmbedding
to an x-transformers encoder.Any help would be appreciated.