mozack / vdjer

V'DJer - B Cell Receptor Repertoire Reconstruction from short read mRNA-Seq data
Other
28 stars 5 forks source link

VDJer: empty output #7

Open mandricigor opened 7 years ago

mandricigor commented 7 years ago

Here is a simulated IGH sample (using simNGS):

igh.fa.txt - simulated IGH transcripts r1.fastq.txt - simulated reads (left pairs) r2.fastq.txt - simulated reads (right pairs)

The .txt suffix is appended to each filename to be able to upload it here to GitHub.

The read length is 100, inferred coverage is 16.

I ran all the commands as described in VDJer Usage and the resulting output is empty. 38 assembly of human was used (alt contigs were removed).

The tools from the same categories: TRUST, mixcr (rna-seq workflow) produce a high number of contigs.

Please, help identify what is the issue with VDJer.

mozack commented 7 years ago

Apologies for the slow response.

There are at least a couple of issues here that will impact V'DJer

1) V'DJer expects full length transcripts to be present and by default requires a minimum untrimmed assembly length of 486 bases or longer (with at least 162 bases extending beyond the conserved J segment AA).

2) As indicated in the paper, standard mode performs well at depth 50x or greater. You may wish to experiment with sensitive mode (although depth of 16x may impact impact assembly there as well).

icTAIR commented 7 years ago

I also met this problem. I used the sensitive mode, but still got the result like this: $ wc -l *
249157 SRRxxxx.log 1 SRRxxxx.sam 0 vdj_contigs.fa 496776 vdjer.dot

And the input bam files are generated using STAR. Please, help identify what is the issue with VDJer.

mozack commented 7 years ago

Can you provide some additional details about your dataset?

icTAIR commented 7 years ago

Thank you for replying. Luckily, I got same results from vDjer. For the sam file, I got 58 lines. For the vdj_contigs.fa, there was only one contig. Is these normal? Moreover, what do the numbers in the sam file imply? Here is a part of the sam file. Could you please tell me each column's meaning?

HD VN:1.4 SO:unsorted @SQ SN:vjf_1_TGTGCGAGTGGGATGTATAGCAGTGGCTGGTACGACGGTATGGACGTCTGG LN:360 SRRxxx.52496807 99 vjf_1_TGTGCGAGTGGGATGTATAGCAGTGGCTGGTACGACGGTATGGACGTCTGG 47 255 76M = 183 212 GGATCCGCCATCCCCCAGGGAAGGGACTGGAGTGGGTTGGGAGTCTCTATTATACTGGGGGCACCTACTACAAACC CCCFFFFFHGHFHJJJJJJJIIJIJIIJJJHI9DGIGHJJJDH@DHIIGHHHFEHHHFEFDDBBDDDDDCDDDDDB

BTW, for some of my vDjer results, the vdj_contigs.fa file contained nothing? Is this possible?

Thank you so much.

mozack commented 7 years ago

"For the vdj_contigs.fa, there was only one contig. Is these normal?"

This will vary from case to case. In the datasets we have explored, some samples wind up with 0 or 1 clones assembled. Others wind up with hundreds.

"Moreover, what do the numbers in the sam file imply?"

The SAM spec is defined here: http://samtools.github.io/hts-specs/SAMv1.pdf

The SAM file is provided to allow for processing by a downstream quantification tool such as RSEM.