milaboratory / mixcr

MiXCR is an ultimate software platform for analysis of Next-Generation Sequencing (NGS) data for immune profiling.
https://mixcr.com
Other
335 stars 79 forks source link

Un-identified isotypes #1054

Open ligalaizik opened 1 year ago

ligalaizik commented 1 year ago

Checklist before submitting the issue:

Expected Result

A description of what you wanted to happen

Actual Result

A description of what actually happened

Exact MiXCR commands

Paste here exact MiXCR commands you used (enclose it with single ` for better formatting)

MiXCR report files

Paste here content of the report files produced by MiXCR (enclose it with triple ``` for better formatting)

ligalaizik commented 1 year ago

Hi, i used this code to run Mixcr on my BCR-seq data : mixcr align \ -OjParameters.parameters.floatingRightBound=false \ -OcParameters.parameters.floatingLeftBound=true \ -s mmu -r report_232.txt M2-D1-14-232_S3_L001_R1_001.fastq.gz M2-D1-14-232_S3_L001_R2_001.fastq.gz output_align_232.vdjca

The parameters were added after I realized that there are a lot of sequences in my data which thier isotype was not identified. But still, with those parameters, I still got 26% of the sequences with an un-identified Isotype, and these un-identified sequences are IgGs ( I'm taking the raw sequences from the fastq files and translating it using expasyTranslation, those sequences have a C region that starts with the amino acids KTT which is the start of the IgGs isotopes).

Is there any way we can fix this? This high amount of unidentified isotopes is strongly affecting our analysis.

Thanks, Ligal.

mizraelson commented 1 year ago

Hi, can you please share an example of such raw read sequence?

ligalaizik commented 1 year ago

Yes. R1: CAAGCCCTCCTTTAATTCCCGAGGTGCAGCTTCAGGAGTCGGGACCTGGCCTGGTGAAACCTTCTCAGTCTCTGTCCCTCACCTGCACTGTCACTGGCTACTCAATCACCAGTGATTATGCCTGGAACTGGATCCGGCAGTTTCCAGGAAACAAAATGGAGTGGATGGGCTACATAAGCTACAGTGGTAGCACTAGCTACAACCCATCTCTCAAAAGTCGAATCTCTATCACTCGAGACACATCCAAGAACCAGTTCTTCCTGCAGTTGAATTCTGTGACTACTGAGGACACAGCCACAT

R2: GGGCGAGGAGAGAGAGAGAGACATCGCCAGGGGATAGACCGATGGGGGTGTCGTTTTGGCTGCAGAGACAGTGACCAGAGTCCCTTGGCCCCAGTAAGCAAACCTTAATCCCCCCGTTCTTGCACAGTAATATGTGGCGGTGTCCTCAGTAGTCACAGAATTCAACTGCAGGAAGAACTGGTTCTTGGATGTGTCTCGAGTGATAGAGATTCGACTTTTGAGAGATGGGTTGTAGCTAGTGCTACCAATGTAGCGTATGTAGCCCATCCACTCCAGGGTGGTTCCAGGAAAAGGCAGGATG

Thanks.

On Mar 24, 2023, at 1:40 PM, mizraelson @.***> wrote:

Hi, can you please share an example of such raw read sequence?

— Reply to this email directly, view it on GitHub https://github.com/milaboratory/mixcr/issues/1054#issuecomment-1482593519, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATCEKPAJTRDQPOULYRMQSL3W5WB4FANCNFSM6AAAAAAWFGTJUA. You are receiving this because you authored the thread.

mizraelson commented 1 year ago

Hi, I just tried it and MiXCR did align C gene for this pair of reads, and assigned IGHG1 gene. Can you please share the report, exact commands you used (including the export) and mixcr version you use, so I can reproduce it on our end.

PoslavskySV commented 1 year ago

Just in case, you can check it with VDJ.online:

https://vdj.online/align-result/DTRKMUWTANFHRDNXNOHVGTFPDYHLABJFULWHKUZG

ligalaizik commented 1 year ago

Mixcr version: v4.0.0b Report- attached Command:

mixcr align \
-OjParameters.parameters.floatingRightBound=false \ -OcParameters.parameters.floatingLeftBound=true \ -s mmu -r report_232.txt M2-D1-14-232_S3_L001_R1_001.fastq.gz M2-D1-14-232_S3_L001_R2_001.fastq.gz output_align_232.vdjca

mixcr exportAlignments -f -nFeature {FR1Begin:FR4End} -targets -vHit -dHit -jHit -cHit -vAlignment -dAlignment -jAlignment -cAlignments -nFeature FR1 -nFeature CDR1 -nFeature FR2 -nFeature CDR2 -nFeature FR3 -nFeature CDR3 -nFeature FR4 -aaFeature FR1 -aaFeature CDR1 -aaFeature FR2 -aaFeature CDR2 -aaFeature FR3 -aaFeature CDR3 -aaFeature FR4 output_align_232.vdjca output_align_232.tsv

On Mar 26, 2023, at 9:28 PM, mizraelson @.***> wrote:

Hi, I just tried it and MiXCR did align C gene for this pair of reads, and assigned IGHG1 gene. Can you please share the report, exact commands you used (including the export) and mixcr version you use, so I can reproduce it on our end. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

Analysis date: Sun Mar 26 11:35:53 IDT 2023 Input file(s): M2-D1-14-232_S3_L001_R1_001.fastq.gz,M2-D1-14-232_S3_L001_R2_001.fastq.gz Output file(s): output_align_232.vdjca Version: 4.0.0b; built=Thu Jul 07 01:14:53 IDT 2022; rev=8050699050; lib=repseqio.v2.0 Command line arguments: align -OjParameters.parameters.floatingRightBound=false -OcParameters.parameters.floatingLeftBound=true -s mmu -r report_232.txt M2-D1-14-232_S3_L001_R1_001.fastq.gz M2-D1-14-232_S3_L001_R2_001.fastq.gz output_align_232.vdjca Analysis time: 4.95m Total sequencing reads: 803812 Successfully aligned reads: 629141 (78.27%) Paired-end alignment conflicts eliminated: 120627 (15.01%) Alignment failed, no hits (not TCR/IG?): 19262 (2.4%) Alignment failed because of absence of V hits: 351 (0.04%) Alignment failed because of absence of J hits: 148334 (18.45%) No target with both V and J alignments: 4449 (0.55%) Alignment failed because of low total score: 2275 (0.28%) Overlapped: 571514 (71.1%) Overlapped and aligned: 422826 (52.6%) Alignment-aided overlaps: 54457 (12.88%) Overlapped and not aligned: 148688 (18.5%) No CDR3 parts alignments, percent of successfully aligned: 730 (0.12%) Partial aligned reads, percent of successfully aligned: 3277 (0.52%) V gene chimeras: 3696 (0.46%) TRA chains: 42 (0.01%) TRB chains: 1 (0%) IGH chains: 629098 (99.99%) Realigned with forced non-floating bound: 573510 (71.35%) Realigned with forced non-floating right bound in left read: 13998 (1.74%) Realigned with forced non-floating left bound in right read: 13998 (1.74%)

mizraelson commented 1 year ago

Hi, unfortunately I can't reproduce the issue. The new MiXCR seems to work fine with this pair of reads. I would recommend updating to the latest version. If you need any help with using the new version in you old pipelines - let me know, we can help with column names, etc.

ligalaizik commented 1 year ago

Ok, Im attaching to this email the output that I’m getting using the code that I sent you in the previous email (its a partial file because of its size), Can you help me create the same output using the new version? Thanks ! Ligal.



On Mar 30, 2023, at 2:01 PM, mizraelson @.***> wrote:

Hi, unfortunately I can't reproduce the results. The new MiXCR seems to work fine with this pair of reads. I would recommend updating to the latest version. If you need any help with using the new version in you old pipelines - let me know, we can help with column names, etc.

— Reply to this email directly, view it on GitHub https://github.com/milaboratory/mixcr/issues/1054#issuecomment-1490104117, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATCEKPAKOHUGAECOJNH5IWDW6VRXTANCNFSM6AAAAAAWFGTJUA. You are receiving this because you authored the thread.

Analysis date: Thu Jan 12 11:28:08 IST 2023 Input file(s): M2-D2-14-233_S4_L001_R1_001.fastq.gz,M2-D2-14-233_S4_L001_R2_001.fastq.gz Output file(s): output_align_233.vdjca Version: 4.0.0b; built=Thu Jul 07 01:14:53 IDT 2022; rev=8050699050; lib=repseqio.v2.0 Command line arguments: align -s mmu -r report_233.txt M2-D2-14-233_S4_L001_R1_001.fastq.gz M2-D2-14-233_S4_L001_R2_001.fastq.gz output_align_233.vdjca --verbose -f Analysis time: 5.23m Total sequencing reads: 1123947 Successfully aligned reads: 567624 (50.5%) Paired-end alignment conflicts eliminated: 82524 (7.34%) Alignment failed, no hits (not TCR/IG?): 61293 (5.45%) Alignment failed because of absence of V hits: 605 (0.05%) Alignment failed because of absence of J hits: 483102 (42.98%) No target with both V and J alignments: 3391 (0.3%) Alignment failed because of low total score: 7932 (0.71%) Overlapped: 944579 (84.04%) Overlapped and aligned: 421828 (37.53%) Alignment-aided overlaps: 45060 (10.68%) Overlapped and not aligned: 522751 (46.51%) No CDR3 parts alignments, percent of successfully aligned: 2917 (0.51%) Partial aligned reads, percent of successfully aligned: 2956 (0.52%) V gene chimeras: 2654 (0.24%) TRA chains: 201 (0.04%) TRB chains: 3 (0%) IGH chains: 567420 (99.96%) Realigned with forced non-floating bound: 448856 (39.94%) Realigned with forced non-floating right bound in left read: 10093 (0.9%) Realigned with forced non-floating left bound in right read: 10093 (0.9%)

mizraelson commented 1 year ago

Hi, So with the latest version of MiXCR (v4.3.2) I use the following command:

mixcr align \
    --preset generic-bcr-amplicon \
    --species mmu \
    --floating-left-alignment-boundary \
    --rigid-right-alignment-boundary C \
    --rna \
    input_R1.fastq.gz input_R2.fastq.gz  \
    result.vdjca

Notice, here i use generic-bcr-amplicon preset which utilizes kAligner2 aligner dedicated for B-cells. In your original command by default you used kAligner1 which was designed for T-cells.

Then,

mixcr exportAlignments -f \
    --drop-default-fields \
    -nFeature {FR1Begin:FR4End} \
    -targets \
    -vHit -dHit -jHit -cHit \
    -vAlignment -dAlignment -jAlignment -cAlignments \
    -nFeature FR1 -nFeature CDR1 -nFeature FR2 \
    -nFeature CDR2 -nFeature FR3 -nFeature CDR3 \
    -nFeature FR4    -aaFeature FR1 -aaFeature CDR1 \
    -aaFeature FR2 -aaFeature CDR2 -aaFeature FR3 \
    -aaFeature CDR3 -aaFeature FR4 \
    result.vdjca \
    alignments.tsv

The command above returns the same columns as the one you used. Practically all parameters are the same, except for --drop-default-fields which overwrites the default set of columns so you only have the ones specified.

ligalaizik commented 1 year ago

Thanks!, I used this code, but still 27% of the reads are with un-identified isotypes. That means that the use of the new version of mixcr didn’t change the % of un-identified isotypes.

On Apr 11, 2023, at 10:06 PM, mizraelson @.***> wrote:

mixcr exportAlignments -f \ --drop-default-fields \ -nFeature {FR1Begin:FR4End} \ -targets \ -vHit -dHit -jHit -cHit \ -vAlignment -dAlignment -jAlignment -cAlignments \ -nFeature FR1 -nFeature CDR1 -nFeature FR2 \ -nFeature CDR2 -nFeature FR3 -nFeature CDR3 \ -nFeature FR4 -aaFeature FR1 -aaFeature CDR1 \ -aaFeature FR2 -aaFeature CDR2 -aaFeature FR3 \ -aaFeature CDR3 -aaFeature FR4 \ result.vdjca \ alignments.tsv

mizraelson commented 1 year ago

Hi, for the pair of reads you have provided earlier, I have used the commands listed above and the isotype was identified correctly.

image

If you can share a sample file (maybe a part of it if its too big) where you see the issue, I can investigate further.

ligalaizik commented 1 year ago

Attached here is the sample file (partial). 

On Apr 14, 2023, at 11:15 AM, mizraelson @.***> wrote:

Hi, for the pair of reads you have provided earlier, I have used the commands listed above and the isotype was identified correctly.

https://user-images.githubusercontent.com/18702359/231985443-f11d49c8-7014-49dd-b394-b560faf14106.png If you can share a sample file (maybe a part of it if its too big) where you see the issue, I can investigate further.

— Reply to this email directly, view it on GitHub https://github.com/milaboratory/mixcr/issues/1054#issuecomment-1508116424, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATCEKPBSDMMFIPHKNLY2K7LXBEBSPANCNFSM6AAAAAAWFGTJUA. You are receiving this because you authored the thread.

mizraelson commented 1 year ago

Hi, I don't see any attached files. Maybe you can send it by email: support@milaboratories.com

ligalaizik commented 1 year ago

Hello, Im using version 4 and these are the commands:

mixcr align -s mmu -r report_237.txt M1-D14-T1-237_S1_L001_R1_001.fastq.gz M1-D14-T1-237_S1_L001_R2_001.fastq.gz output_align_237.vdjca --verbose -f

mixcr exportAlignments -f -nFeature {FR1Begin:FR4End} -targets -vHit -dHit -jHit -cHit -vAlignment -dAlignment -jAlignment -cAlignments -nFeature FR1 -nFeature CDR1 -nFeature FR2 -nFeature CDR2 -nFeature FR3 -nFeature CDR3 -nFeature FR4 -aaFeature FR1 -aaFeature CDR1 -aaFeature FR2 -aaFeature CDR2 -aaFeature FR3 -aaFeature CDR3 -aaFeature FR4 output_align_237.vdjca output_align_237.txt

I want to export the constant region of the sequences, how can I do it ?

Thanks ! Ligal.

mizraelson commented 1 year ago

Hi, To export C gene you can use CRegion gene feature. E.g.: -aaFeature CRegion -nFeature CRegion