milaboratory / mixcr

MiXCR is an ultimate software platform for analysis of Next-Generation Sequencing (NGS) data for immune profiling.
https://mixcr.com
Other
330 stars 79 forks source link

Imprecise identification of isotypes #645

Closed nebskinner closed 3 years ago

nebskinner commented 3 years ago

Hello:

I'm using MiXCR to analyze B cell receptor sequences from bulk BCR-seq. I would like to know the isotype of identified clonotypes. Although, the output I get has isotype listed ("allCHitsWithScore"), for essentially all clonotypes, there are multiple isotypes listed (e.g. "IGHG100(122.4),IGHG200(122.4),IGHGP*00(122.1)"). I'm wondering whether this ambiguity is due to an inability to resolve exact isotype due to incomplete sequencing of the C region or if there's something I've done wrong.

Exact MiXCR commands

" 'mixcr analyze amplicon --species hsa \ --starting-material rna \ --5-end no-v-primers \ --3-end c-primers \ --adapters adapters-present \ --contig-assembly \ --only-productive \ --assemble "-OseparateByC=true" \ ${for_input[$COUNTER]} ${rev_input[$COUNTER]} ${output[$COUNTER]}' "

*This was run as a loop over 20 samples

MiXCR report files

Here is an example of one of the report files:

"'Analysis date: Tue Mar 23 20:14:05 EDT 2021 Input file(s): /home-3/nskinne3@jhu.edu/scratch/bulk_BCR_updated/10256p_R1.fastq.gz,/home-3/nskinne3@jhu.edu/scratch/bulk_BCR_updated/10256p_R2.fastq.gz Output file(s): /home-3/nskinne3@jhu.edu/scratch/isotypes/10256p.vdjca Version: 3.0.13; built=Wed Apr 15 11:28:14 EDT 2020; rev=f614226; lib=repseqio.v1.6 Command line arguments: --species hsa --library default --threads 24 --report /home-3/nskinne3@jhu.edu/scratch/isotypes/10256p.report -p rna-seq -OvParameters.geneFeatureToAlign=VTranscriptWithP -OvParameters.parameters.floatingLeftBound=true -OjParameters.parameters.floatingRightBound=false -OcParameters.parameters.floatingRightBound=true /home-3/nskinne3@jhu.edu/scratch/bulk_BCR_updated/10256p_R1.fastq.gz /home-3/nskinne3@jhu.edu/scratch/bulk_BCR_updated/10256p_R2.fastq.gz /home-3/nskinne3@jhu.edu/scratch/isotypes/10256p.vdjca Analysis time: 7.37m Total sequencing reads: 2958874 Successfully aligned reads: 2747175 (92.85%) Paired-end alignment conflicts eliminated: 552590 (18.68%) Alignment failed, no hits (not TCR/IG?): 172974 (5.85%) Alignment failed because of absence of V hits: 17293 (0.58%) Alignment failed because of absence of J hits: 16118 (0.54%) No target with both V and J alignments: 5314 (0.18%) Overlapped: 2114089 (71.45%) Overlapped and aligned: 1980292 (66.93%) Alignment-aided overlaps: 221770 (11.2%) Overlapped and not aligned: 133797 (4.52%) V gene chimeras: 33029 (1.12%) J gene chimeras: 4 (0%) IGH chains: 722313 (26.29%) IGK chains: 992255 (36.12%) IGL chains: 1032607 (37.59%) Realigned with forced non-floating bound: 2133110 (72.09%) Realigned with forced non-floating right bound in left read: 566758 (19.15%) Realigned with forced non-floating left bound in right read: 566758 (19.15%)

Analysis date: Tue Mar 23 20:21:27 EDT 2021 Input file(s): /home-3/nskinne3@jhu.edu/scratch/isotypes/10256p.vdjca Output file(s): /home-3/nskinne3@jhu.edu/scratch/isotypes/10256p.clna Version: 3.0.13; built=Wed Apr 15 11:28:14 EDT 2020; rev=f614226; lib=repseqio.v1.6 Command line arguments: --report /home-3/nskinne3@jhu.edu/scratch/isotypes/10256p.report --threads 24 --write-alignments -OassemblingFeatures="[CDR3]" -OseparateByV=false -OseparateByJ=true -OseparateByC=false -OseparateByC=true /home-3/nskinne3@jhu.edu/scratch/isotypes/10256p.vdjca /home-3/nskinne3@jhu.edu/scratch/isotypes/10256p.clna Analysis time: 6.42m Final clonotype count: 2695 Average number of reads per clonotype: 832.94 Reads used in clonotypes, percent of total: 2244774 (75.87%) Reads used in clonotypes before clustering, percent of total: 2370885 (80.13%) Number of reads used as a core, percent of used: 2345110 (98.91%) Mapped low quality reads, percent of used: 25775 (1.09%) Reads clustered in PCR error correction, percent of used: 126111 (5.32%) Reads pre-clustered due to the similar VJC-lists, percent of used: 291702 (12.99%) Reads dropped due to the lack of a clone sequence, percent of total: 355016 (12%) Reads dropped due to low quality, percent of total: 116 (0%) Reads dropped due to failed mapping, percent of total: 21138 (0.71%) Reads dropped with low quality clones, percent of total: 20 (0%) Clonotypes eliminated by PCR error correction: 21995 Clonotypes dropped as low quality: 4 Clonotypes pre-clustered due to the similar VJC-lists: 2276 IGH chains: 1060 (39.33%) IGK chains: 791 (29.35%) IGL chains: 844 (31.32%)

Analysis date: Tue Mar 23 20:33:33 EDT 2021 Input file(s): /home-3/nskinne3@jhu.edu/scratch/isotypes/10256p.clna Output file(s): /home-3/nskinne3@jhu.edu/scratch/isotypes/10256p.contigs.clns Version: 3.0.13; built=Wed Apr 15 11:28:14 EDT 2020; rev=f614226; lib=repseqio.v1.6 Command line arguments: --report /home-3/nskinne3@jhu.edu/scratch/isotypes/10256p.report --threads 24 /home-3/nskinne3@jhu.edu/scratch/isotypes/10256p.clna /home-3/nskinne3@jhu.edu/scratch/isotypes/10256p.contigs.clns Analysis time: 5.68m Initial clonotype count: 2695 Final clonotype count: 2695 (100%) Canceled assemblies: 0 (0%) Number of premature termination assembly events, percent of number of initial clonotypes: 3.0 (0.11%) Longest contig length: 669 Clustered variants: 0 (0%) Reads in clustered variants: 0.0 (0%) Reads in divided (newly created) clones: 0.0 (0%) ======================================' "

Thank you!

PoslavskySV commented 3 years ago

Hi,

your commands are correct, so I assume that C region is not covered enough to reliably separate by C, but still I can't be 100% sure without having the full details on the library preparation protocol etc. I can suggest you to look at the full alignments in order to manually investigate the situation.

Best, Stanislav