refresh-bio / agc

Assembled Genomes Compressor
MIT License
153 stars 13 forks source link

Presets for agc create #5

Open niemasd opened 1 year ago

niemasd commented 1 year ago

Re: the brief Twitter back-and-forth, it would be amazing if there were some reasonable presets for specific common contexts. Some that come to mind:

I'm sure other folks will be able to suggest many more useful presets (or perhaps subcategories of these general categories). I'd be happy to break down some possible scenarios for viral genomes (e.g. genome length distribution across most common viruses) if that would be helpful and if stratification would be beneficial

For example, with SARS-CoV-2, folks usually use the entire 29KB genome, but for other viruses like HIV, folks sometimes use the whole ~10KB genome and sometimes use just a portion of the genome (e.g. pol+gag = ~5 KB, or just pol = ~3 KB)

Maybe repeats would be useful as well? E.g. the collection of Alu elements obtained from a given primate genome (~300 bases each)