milaboratory / mixcr

MiXCR is an ultimate software platform for analysis of Next-Generation Sequencing (NGS) data for immune profiling.
https://mixcr.com
Other
323 stars 78 forks source link

Missing pig T cell Receptor Beta receptors seq (TRBV, TRBD and TRBJ genes) CDR3 in RNAseq #545

Closed Dharmendra-G-1 closed 4 years ago

Dharmendra-G-1 commented 4 years ago

I am working on the non-human sample, i.e. pig RNASeq data to get its TCR repertoire characterized. For this, I am using a docker version of mixcr.

docker run -it --rm -v $PWD:/output -w /output mgibio/mixcr /bin/bash

mixcr/mixcr-3.0.3/mixcr
Usage: mixcr [-v] [COMMAND]
  -v, --version   print version information and exit
Commands:
  help                    Displays help information about the specified command
  analyze                 Run full MiXCR pipeline for specific input.
  align                   Builds alignments with V,D,J and C genes for input sequencing reads.
  assemble                Assemble clones.
  assembleContigs         Assemble full sequences.
  assemblePartial         Assembles partially aligned reads into longer sequences.
  extend                  Impute alignments or clones with germline sequences.
  exportAlignments        Export V/D/J/C alignments into tab delimited file.
  exportAlignmentsPretty  Export verbose information about alignments.
  exportClones            Export assembled clones into tab delimited file.
  exportClonesPretty      Export verbose information about clones.
  exportReadsForClones    Export reads for particular clones from "clones & alignments" (*.clna)
                            file. Output file name will be transformed into '_R1'/'_R2' pair in
                            case of paired end reads. Use cloneId = -1 to export alignments not
                            assigned to any clone (not assembled). If no clone ids are specified
                            (only input and output filenames are specified) all reads assigned to
                            clonotypes will be exported.
  exportReads             Export original reads from vdjca file.
  mergeAlignments         Merge several *.vdjca files with alignments into a single alignments file.
  filterAlignments        Filter alignments.
  sortAlignments          Sort alignments in vdjca file by read id.
  alignmentsDiff          Calculates the difference between two .vdjca files.
  clonesDiff              Calculates the difference between two .clns files.
  versionInfo             Output information about MiXCR version which generated the file.
  slice                   Slice ClnA file.
mixcr -v
MiXCR v3.0.3 (built Sun Nov 18 15:48:30 UTC 2018; rev=5281a0f; branch=release/v3.0.3; host=Dmitriys-MacBook-Pro-3341.local)
RepSeq.IO v1.3 (rev=3291c2b)
MiLib v1.9 (rev=6b321c4)
Built-in V/D/J/C library: repseqio.v1.5

I downloaded the imgt.201918-4.sv5.json.gz from https://github.com/repseqio/library-imgt/releases

unzip imgt.201918-4.sv5.json.gz

Once inside the mixcr docker image , I placed this imgt.201918-4.sv5.json in libraries folder of the mixcr docker image then run the following command to process the sample whose pair fastq files are stored in seqs/ directories.

mixcr align -r report_A10169_Kidney -p rna-seq -t 40 --species pig -b imgt.201918-4 seqs/A10169-Kidney-1_R1_001.fastq seqs/A10169-Kidney-1_R2_001.fastq alignments.vdjca

mixcr assemblePartial alignments.vdjca alignments_rescued_1.vdjca

mixcr assemblePartial alignments_rescued_1.vdjca alignments_rescued_2.vdjca

mixcr extend alignments_rescued_2.vdjca alignments_rescued_2_extended.vdjca

mixcr assemble alignments_rescued_2_extended.vdjca clones.clns

mixcr exportClones clones.clns clones.txt

Then inside

I was expecting to have a T cell Receptor Beta receptors seq (TRBV, TRBD and TRBJ genes) mapped but that result file missing all "allCHitsWithScore" column in the clones.txt

excerpt of clones.txt

head clones.txt

cloneId cloneCount  cloneFraction   targetSequences targetQualities allVHitsWithScore   allDHitsWithScore   allJHitsWithScore   allCHitsWithScore   allVAlignments  allDAlignments  allJAlignments  allCAlignments  nSeqFR1 minQualFR1  nSeqCDR1    minQualCDR1 nSeqFR2 minQualFR2  nSeqCDR2    minQualCDR2 nSeqFR3 minQualFR3  nSeqCDR3    minQualCDR3 nSeqFR4 minQualFR4  aaSeqFR1    aaSeqCDR1   aaSeqFR2    aaSeqCDR2   aaSeqFR3    aaSeqCDR3   aaSeqFR4    refPoints
0   2.0 0.2 TGTACTCTGTATAAAAGTGGTGTCAATGCAATTTTC    FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF    IGLV8-10*01(166),IGLV8-13*01(160),IGLV8-18*01(153)      IGLJ2*01(170),IGLJ3*01(155)     264|293|313|0|29|SA282GSC286TST287C|97.0;264|293|313|0|29|SG267ASA282GSC286TST287C|81.0;264|282|313|0|18|SG267A|74.0        24|30|58|30|36||30.0;27|30|58|33|36||15.0                           TGTACTCTGTATAAAAGTGGTGTCAATGCAATTTTC    37                              CTLYKSGVNAIF        :::::::::0:0:29:::::30:-4:36:::
1   2.0 0.2 TGTGTAAGAGGTCTGGAGTTGCTTTTGGTCTGG   FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF   IGHV1-14*01(184),IGHV1S2*01(184),IGHV1S5*01(184),IGHV1S7*01(184),IGHV1S8*01(184)    IGHD1*01(35),IGHD1*02(35),IGHD2*01(30)  IGHJ5*01(180),IGHJ5*02(170)     285|296|316|0|11|SC289T|39.0;285|296|316|0|11|SC289T|39.0;291|302|322|0|11|SC295T|39.0;291|302|322|0|11|SC295T|39.0;291|302|322|0|11|SC295T|39.0    59|66|111|16|23||35.0;59|66|111|16|23||35.0;39|45|84|17|23||30.0    38|43|74|28|33||25.0;29|33|63|29|33||20.0                                               TGTGTAAGAGGTCTGGAGTTGCTTTTGGTCTGG   37                              CVRGLELLLVW     :::::::::0:0:11:16:-22:-8:23:28:-18:33:::
2   1.0 0.1 TGTGCAAGGGGATATATATATACCTATGGTGCTATTTACTGGGATGTGAATATGGATCTCTGG FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF IGHV1-4*02(474),IGHV1S2*01(474),IGHV1S5*01(474),IGHV1S7*01(474),IGHV1S8*01(474) IGHD1*01(64),IGHD1*02(64)   IGHJ5*01(70)        285|293|316|0|8||40.0;285|293|316|0|8||40.0;291|299|322|0|8||40.0;291|299|322|0|8||40.0;291|299|322|0|8||40.0   44|60|111|18|34|SG48C|64.0;44|60|111|18|34|SG48C|64.0   30|43|74|50|63||65.0                                                TGTGCAAGGGGATATATATATACCTATGGTGCTATTTACTGGGATGTGAATATGGATCTCTGG 37                              CARGYIYTYGAIYWDVNMDLW       :::::::::0:-3:8:18:-7:-14:34:50:-10:63:::
3   1.0 0.1 TGTGCAAGAGGCCGTGGTTCAGTGATACTTATAGTGGTGACTGGGATTGGTATGGATCTCTGG FFFFFFFFFFFFFF:FFFFFFF:FFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFF IGHV1S2*01(328),IGHV1S5*01(328),IGHV1S6*01(328),IGHV1S7*01(328),IGHV1S8*01(328) IGHD1*01(35),IGHD2*01(33)   IGHJ5*01(205)       285|298|316|0|13||65.0;291|304|322|0|13||65.0;291|304|322|0|13||65.0;291|304|322|0|13||65.0;291|304|322|0|13||65.0  65|72|111|29|36||35.0;20|33|84|29|42|SC26GSA27G|33.0    30|43|74|50|63||65.0                                                TGTGCAAGAGGCCGTGGTTCAGTGATACTTATAGTGGTGACTGGGATTGGTATGGATCTCTGG 25                              CARGRGSVILIVVTGIGMDLW       :::::::::0:2:13:29:-28:-2:36:50:-10:63:::
4   1.0 0.1 TGTGCAAGAACTATACCTATAGCTATGATGCTAGTCGGCACTATGGATCTCTGG  FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFF  IGHV1S2*01(348),IGHV1S5*01(348),IGHV1S6*01(348),IGHV1S7*01(348),IGHV1S8*01(348) IGHD1*01(79),IGHD1*02(79)   IGHJ5*01(225)       285|294|316|0|9||45.0;291|300|322|0|9||45.0;291|300|322|0|9||45.0;291|300|322|0|9||45.0;291|300|322|0|9||45.0   43|62|111|16|35|SG54A|79.0;43|62|111|16|35|SG54A|79.0   29|43|74|40|54||70.0                                                TGTGCAAGAACTATACCTATAGCTATGATGCTAGTCGGCACTATGGATCTCTGG  25          CARTIPIAMMLVGTMDLW      :::::::::0:-2:9:16:-6:-12:35:40:-9:54:::
5   1.0 0.1 TGTCAGTCATACGATGATAGTTATACTCCTATTTTC    FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF    IGLV3-2*01(1008),IGLV3-2*02(1008)       IGLJ2*01(175),IGLJ3*01(155)     258|289|308|0|31||155.0;258|289|308|0|31||155.0     23|30|58|29|36||35.0;27|30|58|33|36||15.0                                               TGTCAGTCATACGATGATAGTTATACTCCTATTTTC37                              CQSYDDSYTPIF        :::::::::0:1:31:::::29:-3:36:::
6   1.0 0.1 TGTCAGCAGCATGCTAGTGCACCGTATGGTTTC   FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF   IGKV1-11*01(357),IGKV1-11*02(357),IGKV1D-11*01(357)     IGKJ2*01(185),IGKJ2*02(180)     261|284|307|0|23|SC273GSA274C|83.0;261|284|307|0|23|SC273GSA274C|83.0;261|284|307|0|23|SC273GSA274C|83.0        22|31|59|24|33||45.0;23|31|59|25|33||40.0                               TGTCAGCAGCATGCTAGTGCACCGTATGGTTTC   37                              CQQHASAPYGF     :::::::::0:-3:23:::::24:-2:33:::
7   1.0 0.1 TGTGCTCTGTATAAAAGTAGTGGTCATGTTTTC   FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF   IGLV8-13*01(218),IGLV8-19*02(214),IGLV8-21*01(202),IGLV8-21*02(202),IGLV8-18*01(198)        IGLJ2*01(165),IGLJ3*01(155)     264|286|313|0|22||110.0;264|282|313|0|18||90.0;264|286|313|0|22|SG273TSG275T|78.0;264|286|313|0|22|SG273TSG275T|78.0;264|282|313|0|18||90.0     25|30|58|28|33||25.0;27|30|58|30|33||15.0                                               TGTGCTCTGTATAAAAGTAGTGGTCATGTTTTC   37                              CALYKSSGHVF :::::::::0:-7:22:::::28:-5:33:::

The Report file is as follow for reference:

cat report_M6618-Kidney-1
Analysis date: Thu Oct 03 22:07:37 UTC 2019
Input file(s): /output/M6618-Kidney-1_v1/M6618-Kidney-1_R1_001.fastq,/output/M6618-Kidney-1_v1/M6618-Kidney-1_R2_001.fastq
Output file(s): /output/M6618-Kidney-1_v1/alignments.vdjca
Version: 3.0.3; built=Sun Nov 18 15:48:30 UTC 2018; rev=5281a0f; lib=repseqio.v1.5
Command line arguments: align -r /output/M6618-Kidney-1_v1/report_M6618-Kidney-1 -p rna-seq -t 40 --species pig -OallowPartialAlignments=true -b imgt.201918-4 /output/M6618-Kidney-1_v1/M6618-Kidney-1_R1_001.fastq /output/M6618-Kidney-1_v1/M6618-Kidney-1_R2_001.fastq /output/M6618-Kidney-1_v1/alignments.vdjca
Analysis time: 2.71m
Total sequencing reads: 23424752
Successfully aligned reads: 29 (0%)
Alignment failed, no hits (not TCR/IG?): 23420221 (99.98%)
Alignment failed because of absence of CDR3 parts: 61 (0%)
Alignment failed because of low total score: 4441 (0.02%)
Overlapped: 18801852 (80.26%)
Overlapped and aligned: 23 (0%)
Alignment-aided overlaps: 1 (4.35%)
Overlapped and not aligned: 18801829 (80.26%)
IGH chains: 8 (27.59%)
IGK chains: 2 (6.9%)
IGL chains: 19 (65.52%)
======================================

Can you please help to find why I am missing alignments and column with the value for CDR3 (T cell Receptor Beta; TRBV, TRBD and TRBJ genes), is there a parameter I need to change?, any parameter in the command so that RNASeq reads aligned to pig T cell Receptor Beta and other missing TR:

More information: http://www.imgt.org/IMGTrepertoire/LocusGenes/repertoires/pig/TRB/Susscro_TRBrep.html

Gene table: Pig (Sus scrofa) TRBV Gene table: Pig (Sus scrofa) TRBD Gene table: Pig (Sus scrofa) TRBJ Gene table: Pig (Sus scrofa) TRBC Locus description: Pig (Sus scrofa) TRB

Thanks,

Dharm

dbolotin commented 4 years ago

Hi Dharm,

First of all, I recommend using official MiXCR docker image milaboratory/mixcr (link).

I also recommend using analyze command to process the data (link).

All the CDR3's are in the file, in nSeqCDR3 and aaSeqCDR3 columns.

Regarding C-genes: our IMGT converted library does not include their sequences, so they will not be mapped for pig data.