Closed wawpaopao closed 1 year ago
as long as we can
we know some enhancers can be 1 million bp away
yeah,maybe we can try mega-byte which is like a multi-scale model?
@wawpaopao yup something similar to megabyte. megabyte is autoregressive
it will be, in order of confidence, (1) flash attention + curriculum learning (2) hierarchical (3) routed attention
hello, I want to discuss the maximum length,because the genome has multi-scale structure.1-1000s bp is about functional dna motif,10-1000s Kb is about topologically associating domains,and even more. So should we match the appropriate input length according to the phenomenon?