mozack / vdjer

V'DJer - B Cell Receptor Repertoire Reconstruction from short read mRNA-Seq data
Other
28 stars 5 forks source link

Extracting useful reads #3

Open PoslavskySV opened 8 years ago

PoslavskySV commented 8 years ago

Hi,

Is there way to extract reads that were successfully assembled by vdjer from the original bam file?

I mean the following. I supposed that vdjer.sam contains reads that were assembled by vdjer. But, when I try to extract fastq reads from vdjer.sam and re-analyse them, vdjer produces empty result.

Steps to reproduce:

./STAR --readFilesIn demo_R1.fastq demo_R2.fastq ... > demo.sort.bam
./samtools index demo.sort.bam
./vdjer --in demo.sort.bam ... > vdjer.sam

At this point all ok, vdjer.sam contains a lot of alignments and vdj_contigs.fa a lot of records.

Then I do:

./SamToFastq I=vdjer.sam F=demo2_R1.fastq F2=demo2_R2.fastq
./STAR --readFilesIn demo2_R1.fastq demo2_R2.fastq ... > demo2.sort.bam
./samtools index demo2.sort.bam
./vdjer --in demo2.sort.bam ... > vdjer2.sam

and now vdjer2.sam is empty and vdj_contigs.fa is empty too. Why?

Thanks!

mozack commented 7 years ago

Hi,

We don't currently have a way to do that, but we may be able to add a debug option to write the extracted reads to disk in a future release.

The final output SAM file contains only those reads that map to the output 360 base contigs.

There is an internal representation of those contigs that is a fair bit longer than the 360 bases which allows us to measure coverage across the entirety of the final output contig (including partially overlapping reads). Those partially overlapping reads are not emitted in the final output. Using just the final set of reads will not result in sufficient contig length for the assembly.

PoslavskySV commented 7 years ago

Thanks for the answer! Such debug option for writing extracted reads may be really useful!