phiweger / zoo

A portable datastructure for rapid prototyping in (viral) bioinformatics (under development).
5 stars 2 forks source link

large sequences #58

Open phiweger opened 7 years ago

phiweger commented 7 years ago

We are likely to walk into genomes too large to efficiently store in MongoDB w/ its max doc size of 16 mb.

I prefer the latter, because it seems less dependent on the db architecture.

We could implement random access to fasta (or use pyfaidx) easily, so would be quite efficient to retrieve a given sequence once we're computing stuff.

phiweger commented 7 years ago

if seq empty:

links to related sequences