Open HStephane opened 2 years ago
Could you please show me the first few lines of 1_S1_L001_R1_001.fastq and 1_S1_L001_R2_001.fastq?
The assemblies with the same CDR3 could be due to there have differences in V and J genes.
For the Read 1 :
@FS10000339:45:BPG61614-1218:1:1101:1170:1000 1:N:0:1 AGACCTCTCTGTACTTCTGTGCCAGCAGTGGGGCCGGACAGACCAATGAGCAGTTCTTCGGGCCAGGGACACGGCTCACCGTGCTAGAGGACCTGAAAAACGTGTTCCCACCCGAGGTCGCTGTGTTTGAGCCATCAGAAGCAGAGATCTC + FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF @FS10000339:45:BPG61614-1218:1:1101:2100:1000 1:N:0:1 ACTCAGCCGCGTATCTCCGTGCCAGCAGCCGAACGATGTTAAACTCCTACAATGAGCAGTTCTTCGGGCCAGGGACACGGCTCACCGTGCTAGAGGACCTGAAAAACGTGTTCCCACCCGAGGTCGCTGTGTTTGAGCCATTAGAAGCAGA + FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF @FS10000339:45:BPG61614-1218:1:1101:4810:1000 1:N:0:1 ACTTGGCTGTGTATCTCTGTGCCAGCAGCTTGGATTTGGGGCGGGGGGCCAACGTCCTGACTTTCGGGGCCGGCAGCAGGCTGACCGTGCTGGAGGACCTGAAAAACGTGTTCCCACCCGAGGTCGCTGTGTTTGAGCCATCAGAAGCAGA + FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF @FS10000339:45:BPG61614-1218:1:1101:4830:1000 1:N:0:1 ACTCAGCTGTGTATTTTTGTGCCACCTATGGAGGGAGCATTGAGACCATATATTTTGGAGAGGGAAGTTGGCTCACTGTTGTAGAGGACCTGAACAAGGTGTTCCCACCCGAGGTCGCTGTGTTTGAGCCATCAGAAGCAGAGATCTCCCA + FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF, @FS10000339:45:BPG61614-1218:1:1101:4920:1000 1:N:0:1 AGACATCTGTGTACTTCTGTGCCACCTATGGAGGGAGCATTGGGACCATATATTTTGGAGAGGGAAGTTGGCTCACTGTTGTAAAGGACCTGAACAAGGTGTTCCCACCCGAGGTCGCTGTGTTTGAGCCATCAGAAGCAGAGATCTCCCA + FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,
For the Read 2 :
@FS10000339:45:BPG61614-1218:1:1101:1170:1000 2:N:0:1 CTTTTGGGTGTGGGAGATCTCTGCTTCTGATGGCTCAAACACAGCGACCTCGGGTGGGAACACGTTTTTCAGGTCCTCTAGCACGGTGAGCCGTGTCCCTGGCCCGAAGAACTGCTCATTGGTCTGTCCGGCCCCACTGCTGGCACAGAAG + FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFF, @FS10000339:45:BPG61614-1218:1:1101:2100:1000 2:N:0:1 CTTTTGGGTGTGGGAGATCTCTGCTTCTAATGGCTCAAACACAGCGACCTCGGGTGGGAACACGTTTTTCAGGTCCTCTAGCACGGTGAGCCGTGTCCCTGGCCCGAAGAACTGCTCATTGTAGGAGTTTAACATCGTTCGGCTGCTGGCT + FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF, @FS10000339:45:BPG61614-1218:1:1101:4810:1000 2:N:0:1 CTTTTGGGTGTGGGAGATCTCTGCTTCTGATGGCTCAAACACAGCGACCTCGGGTGGGAACACGTTTTTCAGGTCCTCCAGCACGGTCAGCCTGCTGCCGGCCCCGAAAGTCAGGACGTTGGCCCCCCGCCCCTAATCCTAGCTGCTGGCT + FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FF:FF:FFFF,,,F:F,,:,:,,::F,, @FS10000339:45:BPG61614-1218:1:1101:4830:1000 2:N:0:1 CTTTTGGGTGTGGGAGATCTCTGCTTCTGATGGCTCAAACACAGCGACCTCGGGTGGGAACACCTTGTTCAGGTCCTCTACAACAGTGAGCCAACTTCCCTCTCCAAAATATATGGTCTCAATGCTCCCTCCATAGGTGGCACAAAAATAC + FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFF:FFFFFFFF:FF:FFFFFFFFFFFFFFFFFFFFFFF, @FS10000339:45:BPG61614-1218:1:1101:4920:1000 2:N:0:1 CTTTTGGGTGTGGGAGATCTCTGCTTCTGATGGCTCAAACACAGCGACCTCGGGTGGGAACACCTTGTTCAGGTCCTTTACAACAGTGAGCCAACTTCCCTCTCCAAAATATATGGTCCCAATGCTCCCTCCATAGGTGGCACAGAAGTAC + FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
Thank you
The output on the screen does not use mate-pair information, so the 40000 number is about right. But the abundances for clonotypes should already take the mate-pair into account. Could you show me a few lines of "analyzed_assembled_reads.fa" and how did you compute "sum of #counts of clones in the report"? Thank you.
So the first few lines of analyzed_assembled_reads.da is :
FS10000339:45:BPG61614-1218:1:1105:4600:2060 -1 2345 22078 CTTTTGGGTGTGGGAGATCTCTGCTTCTGATGGCTCAAACACAGCGACCTCGGGTGGGAACACGTTTTTCAGGTCCTCCAGTACGGTCAGCCTAGAGCCTTCTCCAAAAAACAGCT FS10000339:45:BPG61614-1218:1:1101:5610:1640 1 2124 10966 CTCAGCTGTGTATCTCTGTGCCAGCAGCCGAACGATGTTAAACTCCTACAATGAGCAGTTCTTCGGGCCAGGGACACGGCTCACCGTGCTAGAGGACCTGAAAAACGTGTTCCCACCCGAGGTCGCTGTGTTTGAGCCATCAGAAGCAGAGATCTCCCACACCCAAAAG FS10000339:45:BPG61614-1218:1:1101:5610:1640.1 1 2124 10966 CTCAGCTGTGTATCTCTGTGCCAGCAGCCGAACGATGTTAAACTCCTACAATGAGCAGTTCTTCGGGCCAGGGACACGGCTCACCGTGCTAGAGGACCTGAAAAACGTGTTCCCACCCGAGGTCGCTGTGTTTGAGCCATCAGAAGCAGAGATCTCCCACACCCAAAAG FS10000339:45:BPG61614-1218:1:1111:7240:1900 1 2124 10966 CTCAGCTGTGTATCTCTGTGCCAGCAGCCGAACGATGTTAAACTCCTACAATGAGCAGTTCTTCGGGCCAGGGACACGGCTCACCGTGCTAGAGGACCTGAAAAACGTGTTCCCACCCGAGGTCGCTGTGTTTGAGCCATCAGAAGCAGAGATCTCCCACACCCAAAAG FS10000339:45:BPG61614-1218:1:1111:7240:1900.1 1 2124 10966 CTCAGCTGTGTATCTCTGTGCCAGCAGCCGAACGATGTTAAACTCCTACAATGAGCAGTTCTTCGGGCCAGGGACACGGCTCACCGTGCTAGAGGACCTGAAAAACGTGTTCCCACCCGAGGTCGCTGTGTTTGAGCCATCAGAAGCAGAGATCTCCCACACCCAAAAG
And I just sum the #counts column in the report with the following code :
cat analyzed__report.tsv | awk '{ sum+=$1} END {print sum}'
Thank you very much
Thanks for noticing this issue! This is indeed a bug in TRUST4. I just pushed the update to the Github repo. Could you please give it a try? Thank you.
Thank you very much, works perfectly now!!
Hi, I am quite new in the BCR/TCRsequencing field and I wanted to try to compare TRUST4 and MIXCR on a simple dataset (amplicons of the CDR3 regions).
I have a paired end fastq with about 20'000 reads each. When I run MIXCR, the reports stated that I have about 20'000 total reads with 93.62% successfully aligned and 87.09% used for clonotypes. However when I run TRUST4, there are about 40'000 reads found and assembled, and the sum of #counts of clones in the report is also about 40'000.
This is the code I used :
./run-trust4 -f hg38_bcrtcr.fa --ref human_IMGT+C.fa -1 input/1_S1_L001_R1_001.fastq -2 input/1_S1_L001_R2_001.fastq -o analyzed_ --od output/ -t 4
I don't know exactly if I am doing something wrong here. It seems that it does not take into account the paired end file.
One thing that I noticed is that in the report, there are several "assembles" with the same CDR3.
Thank you for your time.