Open mandricigor opened 7 years ago
Apologies for the slow response.
There are at least a couple of issues here that will impact V'DJer
1) V'DJer expects full length transcripts to be present and by default requires a minimum untrimmed assembly length of 486 bases or longer (with at least 162 bases extending beyond the conserved J segment AA).
2) As indicated in the paper, standard mode performs well at depth 50x or greater. You may wish to experiment with sensitive mode (although depth of 16x may impact impact assembly there as well).
I also met this problem. I used the sensitive mode, but still got the result like this:
$ wc -l *
249157 SRRxxxx.log
1 SRRxxxx.sam
0 vdj_contigs.fa
496776 vdjer.dot
And the input bam files are generated using STAR. Please, help identify what is the issue with VDJer.
Can you provide some additional details about your dataset?
Thank you for replying. Luckily, I got same results from vDjer. For the sam file, I got 58 lines. For the vdj_contigs.fa, there was only one contig. Is these normal? Moreover, what do the numbers in the sam file imply? Here is a part of the sam file. Could you please tell me each column's meaning?
HD VN:1.4 SO:unsorted @SQ SN:vjf_1_TGTGCGAGTGGGATGTATAGCAGTGGCTGGTACGACGGTATGGACGTCTGG LN:360 SRRxxx.52496807 99 vjf_1_TGTGCGAGTGGGATGTATAGCAGTGGCTGGTACGACGGTATGGACGTCTGG 47 255 76M = 183 212 GGATCCGCCATCCCCCAGGGAAGGGACTGGAGTGGGTTGGGAGTCTCTATTATACTGGGGGCACCTACTACAAACC CCCFFFFFHGHFHJJJJJJJIIJIJIIJJJHI9DGIGHJJJDH@DHIIGHHHFEHHHFEFDDBBDDDDDCDDDDDB
BTW, for some of my vDjer results, the vdj_contigs.fa file contained nothing? Is this possible?
Thank you so much.
"For the vdj_contigs.fa, there was only one contig. Is these normal?"
This will vary from case to case. In the datasets we have explored, some samples wind up with 0 or 1 clones assembled. Others wind up with hundreds.
"Moreover, what do the numbers in the sam file imply?"
The SAM spec is defined here: http://samtools.github.io/hts-specs/SAMv1.pdf
The SAM file is provided to allow for processing by a downstream quantification tool such as RSEM.
Here is a simulated IGH sample (using simNGS):
igh.fa.txt - simulated IGH transcripts r1.fastq.txt - simulated reads (left pairs) r2.fastq.txt - simulated reads (right pairs)
The .txt suffix is appended to each filename to be able to upload it here to GitHub.
The read length is 100, inferred coverage is 16.
I ran all the commands as described in VDJer Usage and the resulting output is empty. 38 assembly of human was used (alt contigs were removed).
The tools from the same categories: TRUST, mixcr (rna-seq workflow) produce a high number of contigs.
Please, help identify what is the issue with VDJer.