Future thought: Instead of subsampling reads randomly, perhaps we could be assembling better genomes and getting more representative mapping if we instead:
1) Get raw reads for X species
2) Trim etc.
3) For 10X Mapping Data: Cat and normalize reads from each species by kmer to ~10X coverage (bbamp can do this)
4) For Genome Data: Subsample normalized reads to create genome data.
Future thought: Instead of subsampling reads randomly, perhaps we could be assembling better genomes and getting more representative mapping if we instead:
1) Get raw reads for X species 2) Trim etc. 3) For 10X Mapping Data: Cat and normalize reads from each species by kmer to ~10X coverage (bbamp can do this) 4) For Genome Data: Subsample normalized reads to create genome data.