Closed cyanbx closed 1 year ago
it actually is the best positional bias for length extrapolation of everything in literature. the bias is parameterized as a continuous function by a small mlp
it actually is the best positional bias for length extrapolation of everything in literature. the bias is parameterized as a continuous function by a small mlp
But how can the mlp embed an integer larger than all possible position it has seen during training? I have actually encountered quality degradation on generating audios longer than the max length used during training.
it won't extrapolate to any length, usually up to 4x in language modeling
if you need greater lengths, recommend fine tuning at the end
it represents the positions as a continuous function. recommend reading NERFs and implicit representations
@cyanbx could be a good research topic, if you are looking to get a graduate degree, just saying :)
@lucidrains Is it related to the relative positional bias scheme used?
ALiBi Seems to be more efficient for extending to long sequences
@lzl1456 imo alibi has a flaw where it restricts the attention to be too local
but i could eventually offer that yes, if one doesn't care about global coherence. it probably has more reliability in extrapolation past 3-4x the sequence length
Hi. It seems that the
RelativePositionBias
inCoarseTransformer
will limit the sequence length during inference, as it cannot embed an integer relative position larger than the max length it meets during training, and therefore cannot handle sequence longer than the max training length. Is there any alternative positional embedding method?