ibest / ARC

Assembly by Reduced Complexity (ARC)
Apache License 2.0
41 stars 5 forks source link

Reduce disk space requirements when using bowtie2 #44

Closed samhunter closed 8 years ago

samhunter commented 9 years ago

It might be possible to reduce space requirements when using Bowtie2 by only writting out mapped reads (--no-unal flag). Before this is done it is necessary to double check that pairs in which one of the two members of the pair have been mapped are both written.

In the future, using a pipe instead of a file to get output from bowtie2 into the parser would be an even better option. This would require re-writing the mapper + splitter however, and it doesn't appear that Blat can output to stdout. So some other strategy will need to be developed (e.g. creating another Blat patch to enable output to stdout).

samhunter commented 9 years ago

Running bowtie2 with --no-unal appears to work just as well as when using the normal output mode (based on VERY limited testing). This has been set as a default parameter for Bowtie2 in the develop branch. Further testing from interested users would be greatly appreciated.

atcg commented 9 years ago

I've done a little testing of the --no-unal version in the develop branch. It does appear that read pairs where only one of the reads is mapping with bowtie2 (for instance, with SAM flag 73 or 153) are having both the R1 and R2 reads output into the PE1.fastq and PE2.fastq files in the "t__" folders. Is that what you're seeing too?

It seems to me the resulting assemblies should be identical to the main branch without the --no-unal flag, yes?

Thanks for implementing! Evan

samhunter commented 9 years ago

Yes, that is what I was seeing as well, and my test assemblies were coming out looking the same with the option on/off. Thanks for the great suggestion, I will put this in as the default behavior in the upcoming version.

samhunter commented 8 years ago

This is enabled by default in v1.1.4-beta.