mozack / vdjer

V'DJer - B Cell Receptor Repertoire Reconstruction from short read mRNA-Seq data
Other
28 stars 5 forks source link

Unexpected output from v'djer #13

Open NicolasHipp opened 6 years ago

NicolasHipp commented 6 years ago

Hi everyone,

I'm trying to use vdjer on my data. I perform the tutorial, with the data provided in the package and It's working. Problems appears when I tried with my data. I have a bam file (median ins 175, 2X75PE, aligned with star with --outSAMtype BAM SortedByCoordinate option).

capture d ecran 2018-07-27 a 18 00 33

I tried:

vdjer --in /bam_test/sample1_Aligned.sortedByCoord.out.bam --t 8 --ins 175 --chain IGH --ref-dir /home/igh > //bam_test/sample1_Aligned.sortedByCoord.out.bam/vdjer_STAR.sam 2> /bam_test/sample1_Aligned.sortedByCoord.out.bam/vdjer_samtools.log

Problems occurs when I looked at the output, I get an empty vdi_contig.fa, and when I looked to vdjer_STAR.sam It returns me (my_root) -bash-4.2$ more vdjer_STAR.sam @HD VN:1.4 SO:unsorted

Does anyone get this and have a solution ?

Thanks a lot ! nicolas

mozack commented 6 years ago

Hi,

You will receive output like this if V'DJer fails to assemble any high confidence contigs.

When running STAR, please confirm that you are also using the following param to include unmapped reads in the BAM file: --outSAMunmapped Within

If so, you may wish to try the more sensitive settings described in the README.

Please note that V'DJer does not always assemble contigs when there is low coverage of the BCR transcripts.

NicolasHipp commented 6 years ago

Hi,

Thanks for this answer, so I tried a lot of things but it still don't work. I read on another post that vdjer used hg38 analysis set without alt contig and ucsc name for the chromosome. So I aligned again with STAR a paired sample (plasmocyte).

I tried a simple version for this alignment as suggested in the other post :

STAR \ --runMode alignReads \ --runThreadN 8 \ --genomeDir /omaha-beach/nhipp/MICROGBW/fastq/fastq_test/Reference/ \ --readFilesIn /omaha-beach/nhipp/MICROGBW/fastq/fastq_test/PAIRED/"$individu"_1_PAIRED_TRIMMED_HLG2FBBXX.fastq /omaha-beach/nhipp/MICROGBW/fastq/fastq_test/PAIRED/"$individu"_2_PAIRED_TRIMMED_HLG2FBBXX.fastq \ --outSAMunmapped Within \ --outSAMtype BAM SortedByCoordinate \ --outFileNamePrefix ....

I ran Picard to evaluate the median of the contig : 237 (which seem important for a 2X75)

and I used these parameters on vdjer

vdjer --in /omaha-beach/nhipp/MICROGBW/fastq/fastq_test/PC/Output_STAR/M649_RD_B00J6DK_5_Aligned.sortedByCoord.out.bam --t 8 --ins 237 --chain IGH --ref-dir /home/genouest/inserm_u1236/nhipp/igh > /omaha-beach/nhipp/MICROGBW/fastq/fastq_test/PC/Output_STAR/Output_PC_vdjervdjer_mygenome.sam 2> /omaha-beach/nhipp/MICROGBW/fastq/fastq_test/PC/Output_STAR/vdjer_mygenome.log

Again, the vdj_contigs.fa is empty and the log file look like :

vdjer_mygenome.log

I explore this file, but cannot find error or something to unstuck me.

sorry for these naive questions nicolas

mozack commented 6 years ago

I don't believe Picard is taking introns into account when computing the insert lengths and the results are likely inaccurate. You'll want to do a quick alignment to a transcriptome to get a more accurate estimate. Feel free to downsample to a million or so reads if you'd like to save on compute time.

Also, in our testing on TCGA data, V'DJer did not assemble BCR sequences for every sample. It is a real possibility that there may be insufficient depth of any single BCR transcript for assembly. You might consider exploring results from a more sensitive tool like Mixcr which may report more sensitive results, but shorter sequence lengths.