Open sergpolly opened 5 years ago
Bowtie2 command that did work for mapping 25bp reads: bowtie2 -x $genome -1 $read_1.fastq.gz -2 $read_2.fastq.gz
Thank you!
Is there currently a supported sequencer that produces 25 bp reads? I haven't heard of any data in the last ~5 years with such short reads!
It's just regular MiSeq paired end run. Since I'm working with a small genome and just testing several conditions, I figured it would be sufficient. But now I wish I would have used longer reads (>50bp)...
Author of minimap
says bwa-mem should be better for Hi-C.
"Furthermore, I also realize that bwa-mem will be better than minimap2 at Hi-C alignment because bwa-mem is more sensitive to short matches."
https://lh3.github.io/2018/04/02/minimap2-and-the-future-of-bwa
this seems important!
Not sure what would be the best way to deal with this issue. After a bit of googling, it does seem that aln would do a better job with shorter reads: https://sourceforge.net/p/bio-bwa/mailman/message/34978319/ http://crazyhottommy.blogspot.com/2017/06/bwa-aln-or-bwa-mem-for-short-reads-36bp.html
Importantly, mapping with aln is two-step: first, you run bwa aln and get .sai files, then you run bwa samse/sampe and convert ,sai files to .sams. We could introduce a switch to the logic of distiller if you feel that it could be important!
Anton.
On Wed, 15 May 2019 at 09:36, Ilya Flyamer notifications@github.com wrote:
Author of minimap says bwa-mem should be better for Hi-C. "Furthermore, I also realize that bwa-mem will be better than minimap2 at Hi-C alignment because bwa-mem is more sensitive to short matches." https://lh3.github.io/2018/04/02/minimap2-and-the-future-of-bwa
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/mirnylab/distiller-nf/issues/141?email_source=notifications&email_token=AAG64CSU46FZOHVEEFPQNU3PVQGWTA5CNFSM4HM3X7B2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVOVW7I#issuecomment-492657533, or mute the thread https://github.com/notifications/unsubscribe-auth/AAG64CSKJPRKQEIER376XSTPVQGWTANCNFSM4HM3X7BQ .
@golobor @Phlya @Marlies1993 after a little chat with Job - this is no longer URGENT, but should probably be addressed anyways - couple of arguments:
in our case it seems like bowtie2
did the job ok (for quick-n-dirty prelim check, at least), according to @Marlies1993 - we just plugged it in, instead of bwa
: https://github.com/dekkerlab/distiller-nf/blob/a510c40471d267397164916d770bbe0d06dbb539/distiller.nf#L469
and rebuild the docker with bowtie2 inside
Importantly, mapping with aln is two-step: first, you run bwa aln and get .sai files, then you run bwa samse/sampe and convert ,sai files to .sams. We could introduce a switch to the logic of distiller if you feel that it could be important!
it's a shame it's not "plug n play" between bwa mem
and bwa aln
- but it seems like the best option to do this switch from mem
to aln
- with all of the intermediate steps - according to this story that @Phlya found https://github.com/mirnylab/distiller-nf/issues/141#issuecomment-492657533
Thank you for this update! These arguments seem very reasonable, indeed. The only issue with bowtie2 that I see is that it requires a different index - but it seems to be working just fine, right? re: bwa aln - we can hide away all of its logic into an .sh script that would create temp .sai files and then convert them into .sams, streaming out the result.
Hi, this is not the problem for MiSeq small reads only. In methods like Hi-CO (https://doi.org/10.1016/j.cell.2018.12.014) the length of meaningful read parts is required to be 15-36 bp due to complex ligation procedure and adaptors trimming. bwa aln seems to handle the problem if inserted into distiller code at the step of mapping:
bwa aln -t ${bwa_threads} ${bwa_index_base} ${fastq1} > ${library}.${run}.${ASSEMBLY_NAME}.${chunk}.1.sai
bwa aln -t ${bwa_threads} ${bwa_index_base} ${fastq2} > ${library}.${run}.${ASSEMBLY_NAME}.${chunk}.2.sai
bwa sampe ${bwa_index_base} ${library}.${run}.${ASSEMBLY_NAME}.${chunk}.1.sai ${library}.${run}.${ASSEMBLY_NAME}.${chunk}.2.sai ${fastq1} ${fastq2} \
| pairtools parse ${dropsam_flag} ${dropreadid_flag} ${dropseq_flag} \
${parsing_options} \
-c ${chrom_sizes} \
| pairtools sort --nproc ${sorting_threads} \
-o ${library}.${run}.${ASSEMBLY_NAME}.${chunk}.pairsam.${suffix} \
--tmpdir \$TASK_TMP_DIR \
| cat```
Hi, there is now a test distiller option for short reads mapping with bwa aln (see long_reads option of map): https://github.com/open2c/distiller-nf/tree/mapping-options
Any suggestions/improvements and testing are much appreciated! @Phlya @Marlies1993 @sergpolly @golobor
Rather major stuff! as discovered by @Marlies1993
bwa mem
yields empty alignments for sequences of 25bp ... apparently it isn't designed for 25bp, and something likebwa aln
should be used instead ...@golobor @nvictus @mimakaev any thought on that ? switch to
minimap
? I know nothing aboutbwa aln
...we'll try to proceed with
bowtie2
for this dataset and will try to avoid 25bp in future, but I think this issue is important enough to consider and address