What is the appropriate maximum length for input?

lucidrains / flash-genomics-model

My own attempt at a long context genomics model, leveraging recent advances in long context attention modeling (Flash Attention + other hierarchical methods)

MIT License

52 stars 5 forks source link

What is the appropriate maximum length for input? #4

Closed wawpaopao closed 1 year ago

wawpaopao commented 1 year ago

hello, I want to discuss the maximum length，because the genome has multi-scale structure.1-1000s bp is about functional dna motif,10-1000s Kb is about topologically associating domains，and even more. So should we match the appropriate input length according to the phenomenon?

lucidrains commented 1 year ago

as long as we can

we know some enhancers can be 1 million bp away

wawpaopao commented 1 year ago

yeah，maybe we can try mega-byte which is like a multi-scale model?

lucidrains commented 1 year ago

@wawpaopao yup something similar to megabyte. megabyte is autoregressive

it will be, in order of confidence, (1) flash attention + curriculum learning (2) hierarchical (3) routed attention