rachelss / SISRS

Site Identification from Short Read Sequences.
24 stars 15 forks source link

Subsample via normalization? #63

Open BobLiterman opened 6 years ago

BobLiterman commented 6 years ago

Future thought: Instead of subsampling reads randomly, perhaps we could be assembling better genomes and getting more representative mapping if we instead:

1) Get raw reads for X species 2) Trim etc. 3) For 10X Mapping Data: Cat and normalize reads from each species by kmer to ~10X coverage (bbamp can do this) 4) For Genome Data: Subsample normalized reads to create genome data.

BobLiterman commented 6 years ago

Check out other aligners, specifically with functionality at degenerate bases.