Closed BakerBunker closed 3 months ago
hmm, is the source and target sequence in some shared coordinate space? usually you cannot use rotary embeddings in cross attention
Thank you for explanation, it's my fault to use rotary embedding in cross attention
@BakerBunker no problem, i should have added an assert to prevent this in cross attention setting