milaboratory / mixcr

MiXCR is an ultimate software platform for analysis of Next-Generation Sequencing (NGS) data for immune profiling.
https://mixcr.com
Other
322 stars 78 forks source link

mixcr kaligner #471

Closed ibseq closed 5 years ago

ibseq commented 5 years ago

hi i ran mixcr with and without -p kaligner2 and got: WITH: Analysis Date: Mon Dec 17 10:38:55 GMT 2018 Input file(s): IB-031218-1_S1_L001_R1_001.TRIM.fastq.gz,IB-031218-1_S1_L001_R2_001.fastq.gz Output file(s): alignments.vdjca Version: 2.1.10; built=Fri Mar 30 15:59:50 BST 2018; rev=87f92f5; lib=repseqio.v1.5 Command line arguments: align --species hs -p kaligner2 -c IGH --report report.txt IB-031218-1_S1_L001_R1_001.TRIM.fastq.gz IB-031218-1_S1_L001_R2_001.fastq.gz alignments.vdjca Analysis time: 8.45s Total sequencing reads: 83857 Successfully aligned reads: 4 (0%) Alignment failed, no hits (not TCR/IG?): 60213 (71.8%) Alignment failed because of absence of V hits: 8 (0.01%) Alignment failed because of absence of J hits: 23632 (28.18%) Overlapped: 4258 (5.08%) Overlapped and aligned: 0 (0%) Alignment-aided overlaps: 0 (�%) Overlapped and not aligned: 4258 (5.08%) IGH chains: 4 (100%)

WITHOUT: Analysis Date: Mon Dec 17 10:21:33 GMT 2018 Input file(s): IB-031218-1_S1_L001_R1_001.TRIM.fastq.gz,IB-031218-1_S1_L001_R2_001.fastq.gz Output file(s): alignmentsH.vdjca Version: 2.1.10; built=Fri Mar 30 15:59:50 BST 2018; rev=87f92f5; lib=repseqio.v1.5 Command line arguments: align --species hs -c IGH --report reportH.txt IB-031218-1_S1_L001_R1_001.TRIM.fastq.gz IB-031218-1_S1_L001_R2_001.fastq.gz alignmentsH.vdjca Analysis time: 8.49s Total sequencing reads: 83857 Successfully aligned reads: 9 (0.01%) Paired-end alignment conflicts eliminated: 1 (0%) Alignment failed, no hits (not TCR/IG?): 11996 (14.31%) Alignment failed because of absence of V hits: 126 (0.15%) Alignment failed because of absence of J hits: 71556 (85.33%) No target with both V and J alignments: 53 (0.06%) Alignment failed because of low total score: 117 (0.14%) Overlapped: 4258 (5.08%) Overlapped and aligned: 0 (0%) Alignment-aided overlaps: 0 (�%) Overlapped and not aligned: 4258 (5.08%) IGH chains: 9 (100%)

what is the difference and which one is true.

also, instruction in https://mixcr.readthedocs.io/en/master/assembleContigs.html state:

align raw sequences

mixcr align --species mmu -p kAligner2 --report report.txt input_R1.fq input_R2.fq alignments.vdjca it should be kaligner2 not kAligner2

finally :

assemble default CDR3 clonotypes (note: --write-alignments is required for further contig assembly)

mixcr assemble --write-alignments --report report.txt alignments.vdjca clones.clna

assemble full BCR receptors

mixcr assembleContigs --report report.txt clones.clna full_clones.clns

none of these is working, any typo mistakes? thanks a lot ibseq

PoslavskySV commented 5 years ago

Hi!

-p kAligner2 uses different aligner, dedicated for analysis of B-cell data; it, in contrast to default kAligner, better supports big gaps. So, answering your question what is the difference and which one is true - the difference is in the alignment algorithms, and both are true.

Regarding contig assembly: you say none of these is working --- could you please paste error messages that arise when you execute commands?

ibseq commented 5 years ago

can i reply to this email?

On 17 Dec 2018, at 14:33, Stanislav Poslavsky notifications@github.com wrote:

Hi!

-p kAligner2 uses different aligner, dedicated for analysis of B-cell data; it, in contrast to default kAligner, better supports big gaps. So, answering your question what is the difference and which one is true - the difference is in the alignment algorithms, and both are true.

Regarding contig assembly: you say none of these is working --- could you please paste error messages that arise when you execute commands?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/milaboratory/mixcr/issues/471#issuecomment-447866279, or mute the thread https://github.com/notifications/unsubscribe-auth/Af_hw1mj-roWq94CSRPemc5zNS68psG2ks5u56tJgaJpZM4ZWGTa.

PoslavskySV commented 5 years ago

yes, you can

ibseq commented 5 years ago
  1. if i try to run what is in https://mixcr.readthedocs.io/en/latest/quickstart.html https://mixcr.readthedocs.io/en/latest/quickstart.html it says “expecting command”: are there more installation for mixcr analyse?

    mixcr analyze amplicon --species hs \ --starting-material dna \ --5-end v-primers \ --3-end j-primers \ --adapters adapters-present \ --receptor-type IGH \ input_R1.fastq input_R2.fastq analysis

i tried also to run :

align raw sequences

mixcr align --species mmu -p kAligner2 --report report.txt input_R1.fq input_R2.fq alignments.vdjca

assemble default CDR3 clonotypes (note: --write-alignments is required for further contig assembly)

mixcr assemble --write-alignments --report report.txt alignments.vdjca clones.clna

assemble full BCR receptors

mixcr assembleContigs --report report.txt clones.clna full_clones.clns

export full BCR receptors

mixcr exportClones -c IG -p fullImputed full_clones.clns full_clones.txt but it doesnt reconigze “ --write-alignment” and consequently assembleContigs

I have a very small data set that i am testing (one cell).

thanks irene

"

On 17 Dec 2018, at 14:33, Stanislav Poslavsky notifications@github.com wrote:

Hi!

-p kAligner2 uses different aligner, dedicated for analysis of B-cell data; it, in contrast to default kAligner, better supports big gaps. So, answering your question what is the difference and which one is true - the difference is in the alignment algorithms, and both are true.

Regarding contig assembly: you say none of these is working --- could you please paste error messages that arise when you execute commands?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/milaboratory/mixcr/issues/471#issuecomment-447866279, or mute the thread https://github.com/notifications/unsubscribe-auth/Af_hw1mj-roWq94CSRPemc5zNS68psG2ks5u56tJgaJpZM4ZWGTa.

PoslavskySV commented 5 years ago

Which version of MiXCR you use (you can check it with mixcr -v)? Both analyze and assembleContigs are available from MiXCR 3.0. You can download the latest version from https://github.com/milaboratory/mixcr/releases

ibseq commented 5 years ago

MiXCR v2.1.10 (built Fri Mar 30 15:59:50 BST 2018; rev=87f92f5; branch=hotfix/v2.1.10) RepSeq.IO v1.2.11 (rev=de00211) MiLib v1.8.3 (rev=1a225d5) Built-in V/D/J/C library: repseqio.v1.5

Library search path:

On 17 Dec 2018, at 14:52, Stanislav Poslavsky notifications@github.com wrote:

Which version of MiXCR you use (you can check it with mixcr -v)? Both analyze and assembleContigs are available from MiXCR 3.0. You can download the latest version from https://github.com/milaboratory/mixcr/releases https://github.com/milaboratory/mixcr/releases — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/milaboratory/mixcr/issues/471#issuecomment-447872206, or mute the thread https://github.com/notifications/unsubscribe-auth/Af_hw6NfH4ke_KmZgjBetb4842597yiQks5u56-ogaJpZM4ZWGTa.

PoslavskySV commented 5 years ago

So I see: you have an outdated version of MiXCR, which does not support both analyze and contig assembly. You need to download the latest version from https://github.com/milaboratory/mixcr/releases

ibseq commented 5 years ago

im a abit confused about the different types of commands listed on the website, standard vs “analyse shotgun”.

I need to see what kind of Ig I have in my cell (I need to know what are the sequences) Do you have a pipeline I can follow?

I am running some other commands i found on the website, as soon as it finishes i will send them to you, just to make sure its correct

thanks irene

On 17 Dec 2018, at 14:55, Stanislav Poslavsky notifications@github.com wrote:

So I see: you have an outdated version of MiXCR, which does not support both analyze and contig assembly. You need to download the latest version from https://github.com/milaboratory/mixcr/releases https://github.com/milaboratory/mixcr/releases — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/milaboratory/mixcr/issues/471#issuecomment-447873332, or mute the thread https://github.com/notifications/unsubscribe-auth/Af_hw4EQM5j3zmAjjiuG2mjcLKT_fSBGks5u57BogaJpZM4ZWGTa.

PoslavskySV commented 5 years ago

Do your have a targeted TCR/IG data or just non-enriched or random fragments?

ibseq commented 5 years ago

it’s target enrichment. it’s only for Ig (paired heavy and light chains).

On 17 Dec 2018, at 15:13, Stanislav Poslavsky notifications@github.com wrote:

Do your have a targeted TCR/IG data or just non-enriched or random fragments?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/milaboratory/mixcr/issues/471#issuecomment-447879495, or mute the thread https://github.com/notifications/unsubscribe-auth/Af_hw2UdJNz7fKUEmjqTGcBiT4_oWoPjks5u57R9gaJpZM4ZWGTa.

PoslavskySV commented 5 years ago

In the latter case you can simply run:

mixcr analyze shotgun
    -s <species> \
    --starting-material <startingMaterial> \
    --receptor-type bcr \
    --contig-assembly \
     input_file_R1.fastq.gz input_file_R2.fastq.gz analysis_name

where you need to replace <species> with your species, and <startingMaterial> with the starting material of your data (rna or dna). Then you will find the complete info about assembled clonotypes in my_analysis.clonotypes.txt

ibseq commented 5 years ago

mixcr analyze shotgun will work only with the new version right?

On 17 Dec 2018, at 15:19, Stanislav Poslavsky notifications@github.com wrote:

In the latter case you can simply run:

mixcr analyze shotgun -s \ --starting-material \ --receptor-type bcr \ --contig-assembly \ input_file_R1.fastq.gz input_file_R2.fastq.gz analysis_name where you need to replace with your species, and with the starting material of your data (rna or dna). Then you will find the complete info about assembled clonotypes in my_analysis.clonotypes.txt

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/milaboratory/mixcr/issues/471#issuecomment-447881705, or mute the thread https://github.com/notifications/unsubscribe-auth/Af_hwxQBXRwJA2fMchQgqNMGV2DsL-p8ks5u57XwgaJpZM4ZWGTa.

ibseq commented 5 years ago

mixcr analyze shotgun gives me Error: Expected a command, got analyze On 17 Dec 2018, at 15:19, Stanislav Poslavsky notifications@github.com wrote:

In the latter case you can simply run:

mixcr analyze shotgun -s \ --starting-material \ --receptor-type bcr \ --contig-assembly \ input_file_R1.fastq.gz input_file_R2.fastq.gz analysis_name where you need to replace with your species, and with the starting material of your data (rna or dna). Then you will find the complete info about assembled clonotypes in my_analysis.clonotypes.txt

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/milaboratory/mixcr/issues/471#issuecomment-447881705, or mute the thread https://github.com/notifications/unsubscribe-auth/Af_hwxQBXRwJA2fMchQgqNMGV2DsL-p8ks5u57XwgaJpZM4ZWGTa.

PoslavskySV commented 5 years ago

So, in this case you need to

Please, let me know if it helps.

ibseq commented 5 years ago

hi, this one has worked and I i have a directory called my analysis that has : my_analysis.report my_analysis.vdjca

I’m a bit confused about the results though: in red: I am not expecting TCR, I only want the results relative to Ig since the primers where Ig specific, and how come there are noJ hits? Analysis date: Wed Dec 19 09:30:21 GMT 2018

Input file(s): /home/ibassano/WORK/raw_data/IB-031218-1_S1_L001_R1_001.TRIM.fastq.gz,/home/ibassano/WORK/raw_data/IB-031218-1_S1_L001_R2_001.fastq.gz

Output file(s): my_analysis.vdjca

Version: 3.0.3; built=Sun Nov 18 15:48:30 GMT 2018; rev=5281a0f; lib=repseqio.v1.5

Command line arguments: --species hsa --report my_analysis.report -p kAligner2 -OvParameters.geneFeatureToAlign=VGeneWithP -OvParameters.parameters.floatingLeftBound=true -OjParameters.parameters.floatingRightBound=false -OcParameters.parameters.floatingRightBound=false /home/ibassano/WORK/raw_data/IB-031218-1_S1_L001_R1_001.TRIM.fastq.gz /home/ibassano/WORK/raw_data/IB-031218-1_S1_L001_R2_001.fastq.gz my_analysis.vdjca

Analysis time: 20.23s

Total sequencing reads: 83857

Successfully aligned reads: 4 (0%)

Alignment failed, no hits (not TCR/IG?): 50224 (59.89%)

Alignment failed because of absence of V hits: 6 (0.01%)

Alignment failed because of absence of J hits: 33622 (40.09%)

No target with both V and J alignments: 1 (0%)

Overlapped: 4258 (5.08%)…..what does it mean?

Overlapped and aligned: 0 (0%)

Alignment-aided overlaps: 0 (�%)….what does it mean?

Overlapped and not aligned: 4258 (5.08%)…..what does it mean?

V gene chimeras: 3 (0%)…..what does it mean?

IGH chains: 4 (100%)….do I need to do this separately for light chains?

On 17 Dec 2018, at 16:43, Stanislav Poslavsky notifications@github.com wrote:

So, in this case you need to

install the latest (3.0.3) version of MiXCR run mixcr analyze amplicon -s \ --starting-material \ --5-end <5End> \ --3-end <3End> \ --adapters \ --receptor-type bcr \ --contig-assembly \ input_file1 [input_file2] my_analysis Then you will find the complete info about assembled clonotypes in my_analysis.clonotypes.txt — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/milaboratory/mixcr/issues/471#issuecomment-447912474, or mute the thread https://github.com/notifications/unsubscribe-auth/Af_hw0A4IPoOLaKxn2Az0NwXlN4sASELks5u58mYgaJpZM4ZWGTa.

PoslavskySV commented 5 years ago

Hi,

you can find the results in my_analysis.clonotypes.IGH.txt

ibseq commented 5 years ago

i dont have this output i only have: my_analysis.report my_analysis.vdjca

On 19 Dec 2018, at 09:56, Stanislav Poslavsky notifications@github.com wrote:

Hi,

you can find the results in my_analysis.clonotypes.IGH.txt

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/milaboratory/mixcr/issues/471#issuecomment-448536486, or mute the thread https://github.com/notifications/unsubscribe-auth/Af_hwxt3MaJqpt2xrgWHEAqiLb2zhG1Jks5u6g1HgaJpZM4ZWGTa.

PoslavskySV commented 5 years ago

So which command you used? the correct command is

mixcr analyze amplicon \
    -s <species> \
    --starting-material <startingMaterial> \
    --5-end <5End> \
    --3-end <3End> \
    --adapters <adapters> \
    --receptor-type bcr \
    --contig-assembly \
    --impute-germline-on-export \
    input_file1 [input_file2] my_analysis

If there were errors (in stderr), please paste them here.

ibseq commented 5 years ago

command used: mixcr analyze amplicon -s hsa --starting-material dna --5-end v-primers --3-end c-primers --adapters no-adapters --receptor-type bcr --contig-assembly /home/ibassano/WORK/raw_data/IB-031218-1_S1_L001_R1_001.TRIM.fastq.gz /home/ibassano/WORK/raw_data/IB-031218-1_S1_L001_R2_001.fastq.gz my_analysis

stderr: NOTE: report file is not specified, using my_analysis.report to write report. Exception in thread "main" picocli.CommandLine$ExecutionException: Error while running command (com.milaboratory.mixcr.cli.CommandAnalyze$CommandAmplicon@636e8cc): java.lang.RuntimeException: java.nio.file.AccessDeniedException: /tmp/mixcr_0d3499d68149326b4f95a55e67e2c4e30abcfcac4526563150686445182.tmp at picocli.CommandLine.execute(CommandLine.java:1004) at picocli.CommandLine.access$900(CommandLine.java:142) at picocli.CommandLine$RunLast.handle(CommandLine.java:1199) at com.milaboratory.mixcr.cli.Main$1.handle(Main.java:48) at com.milaboratory.mixcr.cli.Main$1.handle(Main.java:31) at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1075) at com.milaboratory.mixcr.cli.Main.handleParseResult(Main.java:53) at com.milaboratory.mixcr.cli.Main.main(Main.java:25) Caused by: java.lang.RuntimeException: java.nio.file.AccessDeniedException: /tmp/mixcr_0d3499d68149326b4f95a55e67e2c4e30abcfcac4526563150686445182.tmp at com.milaboratory.util.TempFileManager.getTempFile(TempFileManager.java:62) at com.milaboratory.util.TempFileManager.getTempFile(TempFileManager.java:40) at com.milaboratory.mixcr.assembler.AssemblerEventLogger.(AssemblerEventLogger.java:56) at com.milaboratory.mixcr.assembler.CloneAssembler.(CloneAssembler.java:104) at com.milaboratory.mixcr.cli.CommandAssemble.run1(CommandAssemble.java:147) at com.milaboratory.mixcr.cli.ACommandWithSmartOverwrite.run0(ACommandWithSmartOverwrite.java:99) at com.milaboratory.mixcr.cli.ACommand.run(ACommand.java:73) at com.milaboratory.mixcr.cli.CommandAnalyze.run0(CommandAnalyze.java:608) at com.milaboratory.mixcr.cli.ACommand.run(ACommand.java:73) at picocli.CommandLine.execute(CommandLine.java:996) ... 7 more Caused by: java.nio.file.AccessDeniedException: /tmp/mixcr_0d3499d68149326b4f95a55e67e2c4e30abcfcac4526563150686445182.tmp at sun.nio.fs.UnixException.translateToIOException(UnixException.java:84) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214) at java.nio.file.Files.newByteChannel(Files.java:361) at java.nio.file.Files.createFile(Files.java:632) at java.nio.file.TempFileHelper.create(TempFileHelper.java:138) at java.nio.file.TempFileHelper.createTempFile(TempFileHelper.java:161) at java.nio.file.Files.createTempFile(Files.java:897) at com.milaboratory.util.TempFileManager.getTempFile(TempFileManager.java:54) ... 16 more

On 19 Dec 2018, at 09:58, Stanislav Poslavsky notifications@github.com wrote:

So which command you used? the correct command is

mixcr analyze amplicon \ -s \ --starting-material \ --5-end <5End> \ --3-end <3End> \ --adapters \ --receptor-type bcr \ --contig-assembly \ --impute-germline-on-export \ input_file1 [input_file2] my_analysis If there were errors (in stderr), please paste them here.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/milaboratory/mixcr/issues/471#issuecomment-448537000, or mute the thread https://github.com/notifications/unsubscribe-auth/Af_hw3HRDxBMQHsB9wMsHlRtSd5vq28aks5u6g21gaJpZM4ZWGTa.

PoslavskySV commented 5 years ago

I see. This is an issue with permissions on the server you run MiXCR: iIt seems, that you don't have permissions to use /tmp folder. It is better if you consult to your system administrator to fix this. As a workaround you can try to use the following command:

mixcr -Djava.io.tmpdir=/home/ibassano/some_folder/ analyze amplicon -s hsa --starting-material dna --5-end v-primers --3-end c-primers --adapters no-adapters --receptor-type bcr  --contig-assembly /home/ibassano/WORK/raw_data/IB-031218-1_S1_L001_R1_001.TRIM.fastq.gz /home/ibassano/WORK/raw_data/IB-031218-1_S1_L001_R2_001.fastq.gz my_analysis

Let me know if it help

ibseq commented 5 years ago

got the same results:

NOTE: report file is not specified, using my_analysis.report to write report. Exception in thread "main" picocli.CommandLine$ExecutionException: Error while running command (com.milaboratory.mixcr.cli.CommandAnalyze$CommandAmplicon@636e8cc): java.lang.RuntimeException: java.nio.file.NoSuchFileException: /home/ibassano/some_folder/mixcr_7c47903d37101c728a4d494d02c08760e953f6e73634070732660442173.tmp at picocli.CommandLine.execute(CommandLine.java:1004) at picocli.CommandLine.access$900(CommandLine.java:142) at picocli.CommandLine$RunLast.handle(CommandLine.java:1199) at com.milaboratory.mixcr.cli.Main$1.handle(Main.java:48) at com.milaboratory.mixcr.cli.Main$1.handle(Main.java:31) at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1075) at com.milaboratory.mixcr.cli.Main.handleParseResult(Main.java:53) at com.milaboratory.mixcr.cli.Main.main(Main.java:25) Caused by: java.lang.RuntimeException: java.nio.file.NoSuchFileException: /home/ibassano/some_folder/mixcr_7c47903d37101c728a4d494d02c08760e953f6e73634070732660442173.tmp at com.milaboratory.util.TempFileManager.getTempFile(TempFileManager.java:62) at com.milaboratory.util.TempFileManager.getTempFile(TempFileManager.java:40) at com.milaboratory.mixcr.assembler.AssemblerEventLogger.(AssemblerEventLogger.java:56) at com.milaboratory.mixcr.assembler.CloneAssembler.(CloneAssembler.java:104) at com.milaboratory.mixcr.cli.CommandAssemble.run1(CommandAssemble.java:147) at com.milaboratory.mixcr.cli.ACommandWithSmartOverwrite.run0(ACommandWithSmartOverwrite.java:99) at com.milaboratory.mixcr.cli.ACommand.run(ACommand.java:73) at com.milaboratory.mixcr.cli.CommandAnalyze.run0(CommandAnalyze.java:608) at com.milaboratory.mixcr.cli.ACommand.run(ACommand.java:73) at picocli.CommandLine.execute(CommandLine.java:996) ... 7 more Caused by: java.nio.file.NoSuchFileException: /home/ibassano/some_folder/mixcr_7c47903d37101c728a4d494d02c08760e953f6e73634070732660442173.tmp at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214) at java.nio.file.Files.newByteChannel(Files.java:361) at java.nio.file.Files.createFile(Files.java:632) at java.nio.file.TempFileHelper.create(TempFileHelper.java:138) at java.nio.file.TempFileHelper.createTempFile(TempFileHelper.java:161) at java.nio.file.Files.createTempFile(Files.java:897) at com.milaboratory.util.TempFileManager.getTempFile(TempFileManager.java:54) ... 16 more

On 19 Dec 2018, at 10:04, Stanislav Poslavsky notifications@github.com wrote:

I see. This is an issue with permissions on the server you run MiXCR: iIt seems, that you don't have permissions to use /tmp folder. It is better if you consult to your system administrator to fix this. As a workaround you can try to use the following command:

mixcr -Djava.io.tmpdir=/home/ibassano/some_folder/ analyze amplicon -s hsa --starting-material dna --5-end v-primers --3-end c-primers --adapters no-adapters --receptor-type bcr --contig-assembly /home/ibassano/WORK/raw_data/IB-031218-1_S1_L001_R1_001.TRIM.fastq.gz /home/ibassano/WORK/raw_data/IB-031218-1_S1_L001_R2_001.fastq.gz my_analysis Let me know if it help

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/milaboratory/mixcr/issues/471#issuecomment-448538905, or mute the thread https://github.com/notifications/unsubscribe-auth/Af_hw4cywtbgR-nEJscDInJSX_UVIEQIks5u6g82gaJpZM4ZWGTa.

dbolotin commented 5 years ago

Hi, please create the folder before using it as a temporary storage. E.g.:

mkdir -p /home/ibassano/some_folder/
ibseq commented 5 years ago

sorry silly me.

does it matter where “some_folder” is created? all results will be in /work/ibassano in case it makes a difference

thanks - running now irene

On 19 Dec 2018, at 10:48, Dmitry Bolotin notifications@github.com wrote:

Hi, please create the folder before using it as a temporary storage. E.g.:

mkdir -p /home/ibassano/some_folder/ — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/milaboratory/mixcr/issues/471#issuecomment-448551688, or mute the thread https://github.com/notifications/unsubscribe-auth/Af_hw1x_h2tDIZNu5O86iTpCMZTw00tZks5u6hmTgaJpZM4ZWGTa.

dbolotin commented 5 years ago

No, there is no difference, just place the directory somewhere you have access to. It seems that you have some non-standard configuration on the server, and you have no access to the conventional /tmp directory. MiXCR will use this folder to store intermediate information during analysis, after execution all temp files will be deleted.

Can you please also describe in more details the data you are processing. Is it targeted by means of DNA baits, or something like this, or it is enriched with targeted PCR? What you mean by paired light/heavy data? Does it have both IGH and IGL on the same sequencing paired-read (e.g. originating from the same cell), or it is a multiplex-PCR protocol, so the IGH and IGL reads are distributed across different reads. Have you done any pre-processing for the data?

ibseq commented 5 years ago

it seems has run properly right? i have in my directory my_analysis :

my_analysis.clna my_analysis.clonotypes.IGH.txt my_analysis.clonotypes.IGK.txt my_analysis.clonotypes.IGL.txt my_analysis.contigs.clns my_analysis.report my_analysis.vdjca

stderr: NOTE: report file is not specified, using my_analysis.report to write report. Exporting clones: 0% Exporting clones: 0% Exporting clones: 0%

stdout:: Alignment: 4% Alignment: 14.7% ETA: 00:00:39 Alignment: 30.8% ETA: 00:00:12 Alignment: 41.5% ETA: 00:00:10 Alignment: 57.6% ETA: 00:00:05 Alignment: 69.6% ETA: 00:00:05 Alignment: 84.4% ETA: 00:00:02 Alignment: 96.4% ETA: 00:00:00 ============= Report ============== Analysis time: 20.43s Total sequencing reads: 83857 Successfully aligned reads: 4 (0%) Alignment failed, no hits (not TCR/IG?): 50224 (59.89%) Alignment failed because of absence of V hits: 6 (0.01%) Alignment failed because of absence of J hits: 33622 (40.09%) No target with both V and J alignments: 1 (0%) Overlapped: 4258 (5.08%) Overlapped and aligned: 0 (0%) Alignment-aided overlaps: 0 (�%) Overlapped and not aligned: 4258 (5.08%) V gene chimeras: 3 (0%) IGH chains: 4 (100%) Initialization: progress unknown Preparing for sorting: progress unknown ============= Report ============== Analysis time: 154.00ms Final clonotype count: 1 Average number of reads per clonotype: 1 Reads used in clonotypes, percent of total: 1 (0%) Reads used in clonotypes before clustering, percent of total: 1 (0%) Number of reads used as a core, percent of used: 1 (100%) Mapped low quality reads, percent of used: 0 (0%) Reads clustered in PCR error correction, percent of used: 0 (0%) Reads pre-clustered due to the similar VJC-lists, percent of used: 0 (0%) Reads dropped due to the lack of a clone sequence: 0 (0%) Reads dropped due to low quality: 0 (0%) Reads dropped due to failed mapping: 3 (0%) Reads dropped with low quality clones: 0 (0%) Clonotypes eliminated by PCR error correction: 0 Clonotypes dropped as low quality: 0 Clonotypes pre-clustered due to the similar VJC-lists: 0 Assembling: 0% Writing clones: 0% ============= Report ============== Initial clonotype count: 1 Final clonotype count: 1 (100%) Longest contig length: 57 Clustered variants: 0 (0%) Reads in clustered variants: 0.0 (0%) Reads in divided (newly created) clones: 0.0 (0%)

============================================

    Job resource usage summary 

             Memory (GB)    NCPUs

Requested : 5 3 Used : 3 (peak) 4.67 (ave)

============================================

On 19 Dec 2018, at 10:48, Dmitry Bolotin notifications@github.com wrote:

Hi, please create the folder before using it as a temporary storage. E.g.:

mkdir -p /home/ibassano/some_folder/ — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/milaboratory/mixcr/issues/471#issuecomment-448551688, or mute the thread https://github.com/notifications/unsubscribe-auth/Af_hw1x_h2tDIZNu5O86iTpCMZTw00tZks5u6hmTgaJpZM4ZWGTa.

PoslavskySV commented 5 years ago

BTW, I saw that in some commands above you used mmu and in some hsa species: was it on purpose? (just to make sure that you specified the correct species :) )

ibseq commented 5 years ago

Hi Dmitry, this one in particular is just the results from one cell: it is a test for a protocol I am improving.

I have used the smart seq2 protocol but instead of the whole transcriptome I have used gene specific primes to enrich for the Ig (both heavy an flight chain). the idea is to get from each cell paired heavy and light chains, like the 10x genomics company do in their protocols.

I need to check that i am able to retrieve the sequences before proceeding with all my samples.

let me know if you need more info, I thought mixcr was the perfect tool to look into my data

On 19 Dec 2018, at 11:07, Dmitry Bolotin notifications@github.com wrote:

No, there is no difference, just place the directory somewhere you have access to. It seems that you have some non-standard configuration on the server, and you have no access to the conventional /tmp directory. MiXCR will use this folder to store intermediate information during analysis, after execution all temp files will be deleted.

Can you please also describe in more details the data you are processing. Is it targeted by means of DNA baits, or something like this, or it is enriched with targeted PCR? What you mean by paired light/heavy data? Does it have both IGH and IGL on the same sequencing paired-read (e.g. originating from the same cell), or it is a multiplex-PCR protocol, so the IGH and IGL reads are distributed across different reads. Have you done any pre-processing for the data?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/milaboratory/mixcr/issues/471#issuecomment-448556823, or mute the thread https://github.com/notifications/unsubscribe-auth/Af_hw80VEqNfvGPElb4bnNgb6SVWKfEPks5u6h36gaJpZM4ZWGTa.

ibseq commented 5 years ago

my final command has only has, maybe was because i was copying the general from the web where mum is used as example

On 19 Dec 2018, at 11:10, Stanislav Poslavsky notifications@github.com wrote:

BTW, I saw that in some commands above you used mmu and in some hsa species: was it on purpose? (just to make sure that you specified the correct species :) )

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/milaboratory/mixcr/issues/471#issuecomment-448557623, or mute the thread https://github.com/notifications/unsubscribe-auth/Af_hwzOmeyaAEGxzO2fhYT7n9x93RH1Fks5u6h6vgaJpZM4ZWGTa.

dbolotin commented 5 years ago

If it is possible, can you please share the data, it would be the fastest way to understand what is going on. Let me know if you have restriction on this, then we can guide you the debug process to understand the root of the problem, and whether the data contain target sequences.

ibseq commented 5 years ago

its too large. how can i send it to you. if you have an account on https://usegalaxy.org/ https://usegalaxy.org/ i can upload the data there

On 19 Dec 2018, at 11:23, Dmitry Bolotin notifications@github.com wrote:

If it is possible, can you please share the data, it would be the fastest way to understand what is going on. Let me know if you have restriction on this, then we can guide you the debug process to understand the root of the problem, and whether the data contain target sequences.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/milaboratory/mixcr/issues/471#issuecomment-448561042, or mute the thread https://github.com/notifications/unsubscribe-auth/Af_hw-sPPMdUE6BNgh9KeTLDRcF_yt9Oks5u6iGpgaJpZM4ZWGTa.

dbolotin commented 5 years ago

Just registered with bolotin.dmitriy (at) gmail.com (nickname dbolotin).

ibseq commented 5 years ago

you should have access now.

this is confidential. I trust you will not pass this to anyone.

at the end of the day I want to know if my PCRs have worked and if I can have a final sequence of the immunoglobulins

from my last common run: how do i open extensions ending in .clns and vdi.ca http://vdi.ca/? am i supposed to do something else?

irene

On 19 Dec 2018, at 11:47, Dmitry Bolotin notifications@github.com wrote:

Just registered with bolotin.dmitriy (at) gmail.com (nickname dbolotin).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/milaboratory/mixcr/issues/471#issuecomment-448566865, or mute the thread https://github.com/notifications/unsubscribe-auth/Af_hw0o4GT8Vn5iM8ghWCWfuRRviAKEcks5u6idKgaJpZM4ZWGTa.

dbolotin commented 5 years ago

Of course, I will delete the datasets after debugging. Unfortunately, I still can't find the data on the galaxy site, can you send me a link, or some instructions on how to find it?

clna / clns / vdjca files are all in binary internal mixcr format, you can export the data using exportAlignments, exportClones, exportAlignmentsPretty, etc...

ibseq commented 5 years ago

try this https://usegalaxy.org/u/ibz/h/irene https://usegalaxy.org/u/ibz/h/irene

On 19 Dec 2018, at 12:11, Dmitry Bolotin notifications@github.com wrote:

Of course, I will delete the datasets after debugging. Unfortunately, I still can't find the data on the galaxy site, can you send me a link, or some instructions on how to find it?

clna / clns / vdjca files are all in binary internal mixcr format, you can export the data using exportAlignments, exportClones, exportAlignmentsPretty, etc...

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/milaboratory/mixcr/issues/471#issuecomment-448572798, or mute the thread https://github.com/notifications/unsubscribe-auth/Af_hwxGdc2C13ZX1bum6M_eE2zLW-CDRks5u6iz1gaJpZM4ZWGTa.

PoslavskySV commented 5 years ago

BTW, I just looked your command again and found that you have the R1 file with TRIM suffix, while R2 is not: is it ok?

IB-031218-1_S1_L001_R1_001.TRIM.fastq.gz
IB-031218-1_S1_L001_R2_001.fastq.gz
ibseq commented 5 years ago

yes , it’s because only R1 needed trimming, R2 was ok. it was to differentiate it from the original R1

On 19 Dec 2018, at 12:37, Stanislav Poslavsky notifications@github.com wrote:

BTW, I just looked your command again and found that you have the R1 file with TRIM suffix, while R2 is not: is it ok?

IB-031218-1_S1_L001_R1_001.TRIM.fastq.gz IB-031218-1_S1_L001_R2_001.fastq.gz — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/milaboratory/mixcr/issues/471#issuecomment-448581078, or mute the thread https://github.com/notifications/unsubscribe-auth/Af_hw546zswQuam_mL_gwKqS2AIza5EFks5u6jL4gaJpZM4ZWGTa.

ibseq commented 5 years ago

im not really sure how to read these data: am I getting the right results? see attached

irene

On 19 Dec 2018, at 12:11, Dmitry Bolotin notifications@github.com wrote:

Of course, I will delete the datasets after debugging. Unfortunately, I still can't find the data on the galaxy site, can you send me a link, or some instructions on how to find it?

clna / clns / vdjca files are all in binary internal mixcr format, you can export the data using exportAlignments, exportClones, exportAlignmentsPretty, etc...

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/milaboratory/mixcr/issues/471#issuecomment-448572798, or mute the thread https://github.com/notifications/unsubscribe-auth/Af_hwxGdc2C13ZX1bum6M_eE2zLW-CDRks5u6iz1gaJpZM4ZWGTa.

targetSequences targetQualities allVHitsWithScore allDHitsWithScore allJHitsWithScore allCHitsWithScore allVAlignments allDAlignments allJAlignments allCAlignments nSeqFR1 minQualFR1 nSeqCDR1 minQualCDR1 nSeqFR2 minQualFR2 nSeqCDR2 minQualCDR2 nSeqFR3 minQualFR3 nSeqCDR3 minQualCDR3 nSeqFR4 minQualFR4 aaSeqFR1 aaSeqCDR1 aaSeqFR2 aaSeqCDR2 aaSeqFR3 aaSeqCDR3 aaSeqFR4 refPoints AGTCTCAGAGAGGGGCCTTAAACATGGACTCCAAGGCATTTTCAATTGGTGCCAAGCACTCTCTACTGCTCACTCACCATGCTCATTGGGCTGAGCTGGGTTCTCATTGTTTCTATTTTAG,TTACTGTGCGAGAGGCCGCCTCCTAAAGCCGTCCCCTTTAGACGGGTGGGGCGAGGGAGCCCTGGTCACCGTCTCGTCAGGGAGTGCAGGCGCCACAGCCGTTTTCCCCAGCGTCTCCTGTGAGAATTCCCCGTCGGATACGAGCAGCGTA 2FAFFFHH3322B2A20133553533B333ABG3F1B13B55D55B533@?3B53112B333334B43343BF4F3BG333333444B??G2??G3332?3/02B3BD3B32?F33?3B32,;0;9---A./;..----../:000/----CA>.>1<1<0/?</C/<<//>/0B//?0//EFFFB//?/?///?>0010/0C1F0>/?//>/000/>///0>///ACE?EEFA/A0/111A2HGFB//00000A131111A@11111>>1 IGHV1-800(170),IGHV4-3400(170) IGHD6-600(28),IGHD3-300(26),IGHD2-1500(25) IGHJ400(253),IGHJ500(253) IGHM00(243) ,524|541|559|0|17||170.0;,518|535|553|0|17||170.0 ,27|36|54|26|34|DT30|28.0;,13|21|93|19|27|SA17T|26.0;,27|32|93|20|25||25.0 ,34|68|68|46|80|SC40GSA46GSC63G|253.0;,37|71|71|46|80|SC43GSA49GSC66G|253.0 ,0|71|312|80|151|ST8GSC9GSC14ASA17GSC20GSC29AST30GSG70A|243.0 TGTGCGAGAGGCCGCCTCCTAAAGCCGTCCCCTTTAGACGGGTGG 12 GGCGAGGGAGCCCTGGTCACCGTCTCGTCAG 14 CARGRLLKPSPLDGW GEGALVTVSS :::::::::::::::::::::,:::::::::4:2:17:26:-9:0:34:46:-14:49:80:80: AGTCTCAGATGTGGGCCGTGAAGCTGGACTCCAGGACATTTTCCCGTAGGGCCAGCCACTCTGTCCACCTCACTCACCTTGGTCATTGTGCAGGGCTGTGGTTTCCTTGTTTCTATTTTAG,TTACTGTGCGAGAGGCCGCCTCCTAATGCAGTCACCTCTTGACGGTTGGCGCCAGGGAACCCTGGACACCGTATACTCAGGGAGTGCATCCGCCACAACCCTCTTCCCCAGCGTCTACTGTGAGAATTCCCAGTCGGATACGAGCAGCGTG 2F2BFGHH355AD3E22200115333B1133BF3F1B23D55D3311011B2B11111B333444B43111BG4F3FH333333444B4BG4F0E/0/0B22/0B3BG3F222@2222@22,00=/---C.C:..---..0?111<<1<11F?/1F0@1?//?/F//>//B/1FB1B</GHGFB//B/B221B10B10/0CGGF0/>///B////AB//00///EA/EE/E1GDAA11122HGFBA00AA00A131B1>1@111111>1 IGHV1-800(170),IGHV4-3400(170) IGHD2-2100(48) IGHJ400(224),IGHJ500(224) IGHM00(271) ,524|541|559|0|17||170.0;,518|535|553|0|17||170.0 ,2|15|84|24|36|DA5|48.0 ,34|68|68|46|80|SG37CST53ASC60ASC62A|224.0;,37|71|71|46|80|SG40CST56ASC63ASC65A|224.0 ,0|71|312|80|151|SC14AST22CSC29AST30GSC36ASC51A|271.0 TGTGCGAGAGGCCGCCTCCTAATGCAGTCACCTCTTGACGGTTGG 12 CGCCAGGGAACCCTGGACACCGTATACTCAG 14 CARGRLLMQSPLDGW RQGTLDTVYS :::::::::::::::::::::,:::::::::4:2:17:24:26:-41:36:46:-14:49:80:80: GGTCTCAGACTGGGCCCTTATCCCTTGACTCCAACGCCTTTCCACTTGGTTACCATCACTGAGCACAGAGTACTCACCATGGAATTGGGGCTGAGCTGGGTTTTCCTTGTTGCTCTTTTAG,TTACTGTGCGAGAGGCCTCCTCCTAATGCCGCCGCCCTTTGACTGATGGGGACAGGGAACCCTGGTCACCGTCTCATCAGGGAGTGCACCCGCCGCAACCCATTTCCCCCTGGAGTGATATGAGAAATCCCGGGCGCATACGAACAGCGTG EG22B555333B211101A3555335BF355B33310>11B5@4BE3BG314B344@333BBFE31BFF344?BFE3?B333?333BBG///?002?2?AGA/22@2F2FC222@21?<C,//;--99--/B;/-----;/00909;--9-:--.01111G=>1F0HFG?11F1HGC0F//0111FCCC/FB1>B1010/BGDF1/></>/>/>0/FHF1B1/A///?F110B1222B1222AGB00000A00013111>1DB1>111111 IGHV1-800(190),IGHV4-3400(190),IGHV3-5200(161) IGHD2-200(30),IGHD2-800(30),IGHD6-2500(26) IGHJ400(324),IGHJ500(282) IGHM00(159) ,524|543|559|0|19||190.0;,518|537|553|0|19||190.0;,536|555|571|0|19|ST544C|161.0 ,57|63|93|25|31||30.0;,42|48|93|22|28||30.0;,3|11|54|27|35|ST8C|26.0 ,24|68|68|36|80|SA32GSC33ASC39ASC63A|324.0;,37|71|71|46|80|SC42ASC66A|282.0 ,0|71|312|80|151|ST8CSC14GST21ASC31GST33ASC34GSC36GSC37ASG39AST46ASC51GST53GSG56CSG63A|159.0 TGTGCGAGAGGCCTCCTCCTAATGCCGCCGCCCTTTGACTGATGG 12 GGACAGGGAACCCTGGTCACCGTCTCATCAG 14 CARGLLLMPPPFDW GQGTLVTVSS :::::::::::::::::::::,:::::::::4:4:19:25:-26:1:31:36:-4:49:80:80: GGGTCTGTGCCGAAGTGCAGCTGGCGCAGTCCGGAGCAGAGGTGAAAAAGCCTGGGGAGTCTCTGAGGATCTCCTGTGCGGCTTCTGGATACAAGTTTACCACGCAGCGCATCAGCTGGGT,CCTCGGACACCGCCATCTACTTTTGTGCGAGACATAGACTTGGACGAATGTTTGACCTCGATGGACACTTCGATCTCTGGGGCCCTGGCACCCTGGTCATTGTCTCCCCAGCCTCCAAACAAAGCAACAACAAGTACGCGGCCAGCAGCTA FGGGGHHHHHHGGGGGHHHHHGHHGGGGGGHHGGGGGHHHHGHHHFGHHDHHHHHGGGGGHHHHHGHGGHHHHHHHHHHGGGGGGGGGGGGGGGGGGGGGGGGGGFFFFFFFFFFFFFFFF,GGGGHHGGFGGHGHHHGHHHHHGGGCGHHGHHGHHHHHHHGGGHHHHHHFFFGHGGGEGHHGHHHHGGHGGHHGGFEGEFEEGGBHHGGHGFGGFHFFHGHFGFEGGECGE2GEBBFHHHHHHHHHHGHGGGGGGGFCE4A4CFFFBBBAB IGHV5-10-100(1047) IGHD3-900(36) IGHJ2*00(363) 318|437|643|2|121|ST340CSC368TSA393GSA394CSG397CSG409ASC410GSG418CSC419GST420CSC422GST423CSG425C|813.0,589|624|643|0|35|SG605CST608CSA610TSC611T|234.0 ,36|49|93|43|56|DT40I43G|36.0 ,28|73|73|66|111|SG46CSC61TST69C|363.0 GAAGTGCAGCTGGCGCAGTCCGGAGCAGAGGTGAAAAAGCCTGGGGAGTCTCTGAGGATCTCCTGTGCGGCTTCT 35 GGATACAAGTTTACCACGCAGCGC 37 TGTGCGAGACATAGACTTGGACGAATGTTTGACCTCGATGGACACTTCGATCTCTGG 34 GGCCCTGGCACCCTGGTCATTGTCTCCCCAG 33 EVQLAQSGAEVKKPGESLRISCAAS GYKFTTQR CARHRLGRMFDLDGHFDLW GPGTLVIVSP ::::11:86:110:::::::::::::::,:::::::::23:1:35:43:-5:-13:56:66:-8:80:111::

cloneId cloneCount cloneFraction targetSequences targetQualities allVHitsWithScore allDHitsWithScore allJHitsWithScore allCHitsWithScore allVAlignments allDAlignments allJAlignments allCAlignments nSeqFR1 minQualFR1 nSeqCDR1 minQualCDR1 nSeqFR2 minQualFR2 nSeqCDR2 minQualCDR2 nSeqFR3 minQualFR3 nSeqCDR3 minQualCDR3 nSeqFR4 minQualFR4 aaSeqFR1 aaSeqCDR1 aaSeqFR2 aaSeqCDR2 aaSeqFR3 aaSeqCDR3 aaSeqFR4 refPoints 0 1.0 1.0 TGTGCGAGACATAGACTTGGACGAATGTTTGACCTCGATGGACACTTCGATCTCTGG CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC IGHV5-10-100(1047) IGHD3-900(36) IGHJ2*00(363) 612|624|643|0|12||120.0 36|49|93|20|33|DT40I43G|36.0 28|42|73|43|57||140.0 TGTGCGAGACATAGACTTGGACGAATGTTTGACCTCGATGGACACTTCGATCTCTGG 34 CARHRLGRMFDLDGHFDLW :::::::::0:1:12:20:-5:-13:33:43:-8:57:::

cloneId cloneCount cloneFraction targetSequences targetQualities allVHitsWithScore allDHitsWithScore allJHitsWithScore allCHitsWithScore allVAlignments allDAlignments allJAlignments allCAlignments nSeqFR1 minQualFR1 nSeqCDR1 minQualCDR1 nSeqFR2 minQualFR2 nSeqCDR2 minQualCDR2 nSeqFR3 minQualFR3 nSeqCDR3 minQualCDR3 nSeqFR4 minQualFR4 aaSeqFR1 aaSeqCDR1 aaSeqFR2 aaSeqCDR2 aaSeqFR3 aaSeqCDR3 aaSeqFR4 refPoints 0 1.0 1.0 TGTGCGAGACATAGACTTGGACGAATGTTTGACCTCGATGGACACTTCGATCTCTGG CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC IGHV5-10-100(1047) IGHD3-900(36) IGHJ2*00(363) 612|624|643|0|12||120.0 36|49|93|20|33|DT40I43G|36.0 28|42|73|43|57||140.0 TGTGCGAGACATAGACTTGGACGAATGTTTGACCTCGATGGACACTTCGATCTCTGG 34 CARHRLGRMFDLDGHFDLW :::::::::0:1:12:20:-5:-13:33:43:-8:57:::

cloneId cloneCount cloneFraction targetSequences targetQualities allVHitsWithScore allDHitsWithScore allJHitsWithScore allCHitsWithScore allVAlignments allDAlignments allJAlignments allCAlignments nSeqFR1 minQualFR1 nSeqCDR1 minQualCDR1 nSeqFR2 minQualFR2 nSeqCDR2 minQualCDR2 nSeqFR3 minQualFR3 nSeqCDR3 minQualCDR3 nSeqFR4 minQualFR4 aaSeqFR1 aaSeqCDR1 aaSeqFR2 aaSeqCDR2 aaSeqFR3 aaSeqCDR3 aaSeqFR4 refPoints 0 1.0 1.0 TGTGCGAGACATAGACTTGGACGAATGTTTGACCTCGATGGACACTTCGATCTCTGG CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC IGHV5-10-100(1047) IGHD3-900(36) IGHJ2*00(363) 612|624|643|0|12||120.0 36|49|93|20|33|DT40I43G|36.0 28|42|73|43|57||140.0 TGTGCGAGACATAGACTTGGACGAATGTTTGACCTCGATGGACACTTCGATCTCTGG 34 CARHRLGRMFDLDGHFDLW :::::::::0:1:12:20:-5:-13:33:43:-8:57:::

cloneId cloneCount cloneFraction targetSequences targetQualities allVHitsWithScore allDHitsWithScore allJHitsWithScore allCHitsWithScore allVAlignments allDAlignments allJAlignments allCAlignments nSeqFR1 minQualFR1 nSeqCDR1 minQualCDR1 nSeqFR2 minQualFR2 nSeqCDR2 minQualCDR2 nSeqFR3 minQualFR3 nSeqCDR3 minQualCDR3 nSeqFR4 minQualFR4 aaSeqFR1 aaSeqCDR1 aaSeqFR2 aaSeqCDR2 aaSeqFR3 aaSeqCDR3 aaSeqFR4 refPoints 0 1.0 1.0 TGTGCGAGACATAGACTTGGACGAATGTTTGACCTCGATGGACACTTCGATCTCTGG CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC IGHV5-10-100(1047) IGHD3-900(36) IGHJ2*00(363) 612|624|643|0|12||120.0 36|49|93|20|33|DT40I43G|36.0 28|42|73|43|57||140.0 TGTGCGAGACATAGACTTGGACGAATGTTTGACCTCGATGGACACTTCGATCTCTGG 34 CARHRLGRMFDLDGHFDLW :::::::::0:1:12:20:-5:-13:33:43:-8:57:::

targetSequences targetQualities allVHitsWithScore allDHitsWithScore allJHitsWithScore allCHitsWithScore allVAlignments allDAlignments allJAlignments allCAlignments nSeqFR1 minQualFR1 nSeqCDR1 minQualCDR1 nSeqFR2 minQualFR2 nSeqCDR2 minQualCDR2 nSeqFR3 minQualFR3 nSeqCDR3 minQualCDR3 nSeqFR4 minQualFR4 aaSeqFR1 aaSeqCDR1 aaSeqFR2 aaSeqCDR2 aaSeqFR3 aaSeqCDR3 aaSeqFR4 refPoints GGGTCTGTGCCGAAGTGCAGCTGGCGCAGTCCGGAGCAGAGGTGAAAAAGCCTGGGGAGTCTCTGAGGATCTCCTGTGCGGCTTCTGGATACAAGTTTACCACGCAGCGCATCAGCTGGGT,CCTCGGACACCGCCATCTACTTTTGTGCGAGACATAGACTTGGACGAATGTTTGACCTCGATGGACACTTCGATCTCTGGGGCCCTGGCACCCTGGTCATTGTCTCCCCAGCCTCCAAACAAAGCAACAACAAGTACGCGGCCAGCAGCTA FGGGGHHHHHHGGGGGHHHHHGHHGGGGGGHHGGGGGHHHHGHHHFGHHDHHHHHGGGGGHHHHHGHGGHHHHHHHHHHGGGGGGGGGGGGGGGGGGGGGGGGGGFFFFFFFFFFFFFFFF,GGGGHHGGFGGHGHHHGHHHHHGGGCGHHGHHGHHHHHHHGGGHHHHHHFFFGHGGGEGHHGHHHHGGHGGHHGGFEGEFEEGGBHHGGHGFGGFHFFHGHFGFEGGECGE2GEBBFHHHHHHHHHHGHGGGGGGGFCE4A4CFFFBBBAB IGHV5-10-100(1047) IGHD3-900(36) IGHJ200(363) 318|437|643|2|121|ST340CSC368TSA393GSA394CSG397CSG409ASC410GSG418CSC419GST420CSC422GST423CSG425C|813.0,589|624|643|0|35|SG605CST608CSA610TSC611T|234.0 ,36|49|93|43|56|DT40I43G|36.0 ,28|73|73|66|111|SG46CSC61TST69C|363.0 GAAGTGCAGCTGGCGCAGTCCGGAGCAGAGGTGAAAAAGCCTGGGGAGTCTCTGAGGATCTCCTGTGCGGCTTCT 35 GGATACAAGTTTACCACGCAGCGC 37 TGTGCGAGACATAGACTTGGACGAATGTTTGACCTCGATGGACACTTCGATCTCTGG 34 GGCCCTGGCACCCTGGTCATTGTCTCCCCAG 33 EVQLAQSGAEVKKPGESLRISCAAS GYKFTTQR CARHRLGRMFDLDGHFDLW GPGTLVIVSP_ ::::11:86:110:::::::::::::::,:::::::::23:1:35:43:-5:-13:56:66:-8:80:111:: AGTCTCAGAGAGGGGCCTTAAACATGGACTCCAAGGCATTTTCAATTGGTGCCAAGCACTCTCTACTGCTCACTCACCATGCTCATTGGGCTGAGCTGGGTTCTCATTGTTTCTATTTTAG,TTACTGTGCGAGAGGCCGCCTCCTAAAGCCGTCCCCTTTAGACGGGTGGGGCGAGGGAGCCCTGGTCACCGTCTCGTCAGGGAGTGCAGGCGCCACAGCCGTTTTCCCCAGCGTCTCCTGTGAGAATTCCCCGTCGGATACGAGCAGCGTA 2FAFFFHH3322B2A20133553533B333ABG3F1B13B55D55B533@?3B53112B333334B43343BF4F3BG333333444B??G2??G3332?3/02B3BD3B32?F33?3B32,;0;9---A./;..----../:000/----CA>.>1<1<0/?</C/<<//>/0B//?0//EFFFB//?/?///?>0010/0C1F0>/?//>/000/>///0>///ACE?EEFA/A0/111A2HGFB//00000A131111A@11111>>1 IGHV1-800(170),IGHV4-3400(170) IGHD6-600(28),IGHD3-300(26),IGHD2-1500(25) IGHJ400(253),IGHJ500(253) IGHM00(243) ,524|541|559|0|17||170.0;,518|535|553|0|17||170.0 ,27|36|54|26|34|DT30|28.0;,13|21|93|19|27|SA17T|26.0;,27|32|93|20|25||25.0 ,34|68|68|46|80|SC40GSA46GSC63G|253.0;,37|71|71|46|80|SC43GSA49GSC66G|253.0 ,0|71|312|80|151|ST8GSC9GSC14ASA17GSC20GSC29AST30GSG70A|243.0 TGTGCGAGAGGCCGCCTCCTAAAGCCGTCCCCTTTAGACGGGTGG 12 GGCGAGGGAGCCCTGGTCACCGTCTCGTCAG 14 CARGRLLKPSPLDGW GEGALVTVSS_ :::::::::::::::::::::,:::::::::4:2:17:26:-9:0:34:46:-14:49:80:80: AGTCTCAGATGTGGGCCGTGAAGCTGGACTCCAGGACATTTTCCCGTAGGGCCAGCCACTCTGTCCACCTCACTCACCTTGGTCATTGTGCAGGGCTGTGGTTTCCTTGTTTCTATTTTAG,TTACTGTGCGAGAGGCCGCCTCCTAATGCAGTCACCTCTTGACGGTTGGCGCCAGGGAACCCTGGACACCGTATACTCAGGGAGTGCATCCGCCACAACCCTCTTCCCCAGCGTCTACTGTGAGAATTCCCAGTCGGATACGAGCAGCGTG 2F2BFGHH355AD3E22200115333B1133BF3F1B23D55D3311011B2B11111B333444B43111BG4F3FH333333444B4BG4F0E/0/0B22/0B3BG3F222@2222@22,00=/---C.C:..---..0?111<<1<11F?/1F0@1?//?/F//>//B/1FB1B</GHGFB//B/B221B10B10/0CGGF0/>///B////AB//00///EA/EE/E1GDAA11122HGFBA00AA00A131B1>1@111111>1 IGHV1-800(170),IGHV4-3400(170) IGHD2-2100(48) IGHJ400(224),IGHJ500(224) IGHM00(271) ,524|541|559|0|17||170.0;,518|535|553|0|17||170.0 ,2|15|84|24|36|DA5|48.0 ,34|68|68|46|80|SG37CST53ASC60ASC62A|224.0;,37|71|71|46|80|SG40CST56ASC63ASC65A|224.0 ,0|71|312|80|151|SC14AST22CSC29AST30GSC36ASC51A|271.0 TGTGCGAGAGGCCGCCTCCTAATGCAGTCACCTCTTGACGGTTGG 12 CGCCAGGGAACCCTGGACACCGTATACTCAG 14 CARGRLLMQSPLDGW RQGTLDTVYS_ :::::::::::::::::::::,:::::::::4:2:17:24:26:-41:36:46:-14:49:80:80: GGTCTCAGACTGGGCCCTTATCCCTTGACTCCAACGCCTTTCCACTTGGTTACCATCACTGAGCACAGAGTACTCACCATGGAATTGGGGCTGAGCTGGGTTTTCCTTGTTGCTCTTTTAG,TTACTGTGCGAGAGGCCTCCTCCTAATGCCGCCGCCCTTTGACTGATGGGGACAGGGAACCCTGGTCACCGTCTCATCAGGGAGTGCACCCGCCGCAACCCATTTCCCCCTGGAGTGATATGAGAAATCCCGGGCGCATACGAACAGCGTG EG22B555333B211101A3555335BF355B33310>11B5@4BE3BG314B344@333BBFE31BFF344?BFE3?B333?333BBG///?002?2?AGA/22@2F2FC222@21?<C,//;--99--/B;/-----;/00909;--9-:--.01111G=>1F0HFG?11F1HGC0F//0111FCCC/FB1>B1010/BGDF1/></>/>/>0/FHF1B1/A///?F110B1222B1222AGB00000A00013111>1DB1>111111 IGHV1-800(190),IGHV4-3400(190),IGHV3-5200(161) IGHD2-200(30),IGHD2-800(30),IGHD6-2500(26) IGHJ400(324),IGHJ500(282) IGHM00(159) ,524|543|559|0|19||190.0;,518|537|553|0|19||190.0;,536|555|571|0|19|ST544C|161.0 ,57|63|93|25|31||30.0;,42|48|93|22|28||30.0;,3|11|54|27|35|ST8C|26.0 ,24|68|68|36|80|SA32GSC33ASC39ASC63A|324.0;,37|71|71|46|80|SC42ASC66A|282.0 ,0|71|312|80|151|ST8CSC14GST21ASC31GST33ASC34GSC36GSC37ASG39AST46ASC51GST53GSG56CSG63A|159.0 TGTGCGAGAGGCCTCCTCCTAATGCCGCCGCCCTTTGACTGATGG 12 GGACAGGGAACCCTGGTCACCGTCTCATCAG 14 CARGLLLMPPPFD*W GQGTLVTVSS_ :::::::::::::::::::::,:::::::::4:4:19:25:-26:1:31:36:-4:49:80:80:

dbolotin commented 5 years ago

After some research, we found that nearly all reads in your dataset lack CDR3

See typical alignments:

>>> Read ids: 2

                                                  <5'UTR5'UTR><L1
                                                               I  A  R  F  P  L  L  L  T  L  L
    Quality    33337333373733262233333333333373333333373736333373333322562367677773563353577277
    Target0  0 TTCCTCAGCTGTTGGTTGATATTTCTTGTCTCTGTACATTCTCCATCATTGCCCGCTTCCCTCTCCTCCTCACCCTCCTC 79   Score
IGLV1-44*00 88                                    acaAtctccaGcatGgccAgcttccctctcctcctcaccctcctc 132  231
IGLV1-47*00 88                                    acaAtctccaGcatGgccGgcttccctctcctcctcaccctcctc 132  231

                    L1>
                 T  H _
    Quality     36773673633333266673326652775636367637773
    Target0  80 ACTCACTGTGCAGGGTCCTGGGCCCATTCTGTGTTGACTCA 120  Score
IGLV1-44*00 133 actcactgtgcagg                            146  231
IGLV1-47*00 133 actcactgtgcagg                            146  231

 Quality    22553573727666333333337533777633563333662526335333337677756636333763337673766633
 Target1  0 AAAAAAAGGCCACAAAGGAGAGTCTCATAAGAGAAAAATACCCGGGAGAAGAAACAGTGGCCAGGAAGGAAGATAGCAGA 79   Score
IGLC2*00 60 CCaaCaaggccacaCTggTgTgtctcataagTgaCTTCtacccgggagCCgTGacagtggccTggaaggCagatagcagC 139  198
IGLC3*00 60 CCaaCaaggccacaCTggTgTgtctcataagTgaCTTCtacccgggagCCgTGacagtggccTggaaggCagatagcagC 139  198

 Quality     33333377226666633336367677773736222366777776677676376377633663363353335
 Target1  80 AAAGAAAAGGCGGGAGTGGAGACCACCACAGAAAAAAAACAAAGCAAAAAAAAGTACGCGGCAAGAAGAGA 150  Score
IGLC2*00 140 CCCgTCaaggcgggagtggagaccaccacaCCCTCCaaacaaagcaaCaaCaagtacgcggc          201  198
IGLC3*00 140 CCCgTCaaggcgggagtggagaccaccacaCCCTCCaaacaaagcaaCaaCaagtacgcggc          201  198

>>> Read ids: 3

                   <5'UTR                               5'UTR><L1
                                                               M  A  S  F  P  L  L  L  T  L  L
    Quality    33367633373733263233333363323667333363366777333673336322777777777776675366777777
    Target0  0 ATCCTCAGCTGTGGGTAGAGAAGACAGGACTCAGGACAATCTCCAGCATGGCCAGCTTCCCTCTCCTCCTCACCCTCCTC 79   Score
IGLV1-44*00 57     tcagctgtgggtagagaagacaggactcaggacaatctccagcatggccagcttccctctcctcctcaccctcctc 132  450
IGLV1-47*00 58     tcagctgtgg-tagagaagacaggaTtcaggacaatctccagcatggccGgcttccctctcctcctcaccctcctc 132  392

                    L1>
                 T  H _
    Quality     67773673737333267573327762777736367667777
    Target0  80 ACTCACTGTGCAGGGTCCTGGGCCCATTCTGTGTTGGCTCA 120  Score
IGLV1-44*00 133 actcactgtgcagg                            146  450
IGLV1-47*00 133 actcactgtgcagg                            146  392

 Quality    22553777666766333333335665777633363333662277766533533677775636335763337776776633
 Target1  0 AAAAAAAGGCCACACTGGAGAGTCTCATAAGAGAAAAATACCCGGGAGAAGAGACAGTGGCCAGGAAGGAAGATAGCAGA 79   Score
IGLC2*00 60 CCaaCaaggccacactggTgTgtctcataagTgaCTTCtacccgggagCCgTgacagtggccTggaaggCagatagcagC 139  246
IGLC3*00 60 CCaaCaaggccacactggTgTgtctcataagTgaCTTCtacccgggagCCgTgacagtggccTggaaggCagatagcagC 139  246

 Quality     33333377777766633337662773673732222366767777666376376377767663376353335
 Target1  80 AAAGAAAAGGCGGGAGTGGAGACCACCACAGAAAAAAAACAAAGCAAAAAAAAGTACGCGGCAAGAAGATA 150  Score
IGLC2*00 140 CCCgTCaaggcgggagtggagaccaccacaCCCTCCaaacaaagcaaCaaCaagtacgcggc          201  246
IGLC3*00 140 CCCgTCaaggcgggagtggagaccaccacaCCCTCCaaacaaagcaaCaaCaagtacgcggc          201  246

>>> Read ids: 5

                   <5'UTR                               5'UTR><L1
                                                               M  A  S  F  P  L  L  L  T  L  L
    Quality    33337633363733363233233333323263333323363766336373336322677777777777576363577777
    Target0  0 ATCCTCAGCTGTGTGTTGAGAAGACAGGACTCAGGACAATCTCCAGCATGGCCAGCTTCCCTCTCCTCCTCACCCTCCTC 79   Score
IGLV1-44*00 57     tcagctgtgGgtAgagaagacaggactcaggacaatctccagcatggccagcttccctctcctcctcaccctcctc 132  418
IGLV1-47*00 58     tcagctgtg-gtAgagaagacaggaTtcaggacaatctccagcatggccGgcttccctctcctcctcaccctcctc 132  376

                    L1>
                 T  H _
    Quality     36776676755333277273356676777736577777777
    Target0  80 ACTCACTGTGCAGGGTCCTGGGCCCATTCTGTGTTGGCTCA 120  Score
IGLV1-44*00 133 actcactgtgcagg                            146  418
IGLV1-47*00 133 actcactgtgcagg                            146  376

 Quality    33553577763366332235236376777633363333676653566523333677776656333763367777776633
 Target1  0 AAAACAAGGCCACACTGGAGGGTCTCATAAGAGAAAAATACCCGGGAGACGAGACAGTGGCCAGGAAGGAAGATAGCAGA 79   Score
IGLC2*00 60 CCaacaaggccacactggTgTgtctcataagTgaCTTCtacccgggagCcgTgacagtggccTggaaggCagatagcagC 139  309
IGLC3*00 60 CCaacaaggccacactggTgTgtctcataagTgaCTTCtacccgggagCcgTgacagtggccTggaaggCagatagcagC 139  309

 Quality     33333377655666733337362676373733326366777776677677376377776663363353335
 Target1  80 CAAGAAAAGGCGGGAGTGGAGACCACCACAGAGAAAAAACAAAGCAAAAAAAAGTACGCGGCCAGAAGATA 150  Score
IGLC2*00 140 cCCgTCaaggcgggagtggagaccaccacaCCCTCCaaacaaagcaaCaaCaagtacgcggccag       204  309
IGLC3*00 140 cCCgTCaaggcgggagtggagaccaccacaCCCTCCaaacaaagcaaCaaCaagtacgcggccag       204  309

>>> Read ids: 6

                   <5'UTR                               5'UTR><L1
                                                               M  A  S  F  P  L  L  L  T  L  L
    Quality    33637633673763273633333363323676333363366777336377637322677777777773776365677777
    Target0  0 ATCCTCAGCTGTGGGTAGAGAAGACAGGACTCAGGACAATCTCCAGCATGGCCAGCTTCCCTCTCCTCCTCACCCTCCTC 79   Score
IGLV1-44*00 57     tcagctgtgggtagagaagacaggactcaggacaatctccagcatggccagcttccctctcctcctcaccctcctc 132  450
IGLV1-47*00 58     tcagctgt-ggtagagaagacaggaTtcaggacaatctccagcatggccGgcttccctctcctcctcaccctcctc 132  392

                    L1>
                 T  H _
    Quality     36773673767333277277677776777737377777776
    Target0  80 ACTCACTGTGCAGGGTCCTGGGCCCATTCTGTGTTGACTCA 120  Score
IGLV1-44*00 133 actcactgtgcagg                            146  450
IGLV1-47*00 133 actcactgtgcagg                            146  392

 Quality    33663777227636332236237767777633563333777777666532337777777736337766337776773622
 Target1  0 AAAACAAGGCCACACAGGAGGGTCTCATAAGAGAAGAATACCCGGGAGACGTAACAGTGGCCAGGAAGGAAGATAGCAGC 79   Score
IGLC2*00 60 CCaacaaggccacacTggTgTgtctcataagTgaCTTCtacccgggagCcgtGacagtggccTggaaggCagatagcagc 139  341
IGLC3*00 60 CCaacaaggccacacTggTgTgtctcataagTgaCTTCtacccgggagCcgtGacagtggccTggaaggCagatagcagc 139  341

 Quality     33233377776767763637366777776733222366777777666376377377633663363353335
 Target1  80 AACGTAAAGGCGGGAGTGGAGACCACCACAGACAAAAAACAAAGCAACAAAAAGTACGCGGACAGAAGATA 150  Score
IGLC2*00 140 CCcgtCaaggcgggagtggagaccaccacaCCcTCCaaacaaagcaacaaCaagtacgcggCcag       204  341
IGLC3*00 140 CCcgtCaaggcgggagtggagaccaccacaCCcTCCaaacaaagcaacaaCaagtacgcggCcag       204  341

>>> Read ids: 7

                          L2><FR1
                  _  V  C  A  E  V  Q  L  A  Q  S  G  V  E  V  K  K  P  G  E  S  L  R  I  S  C  A
      Quality     33366736367222263733673362622766222263333377736377667622222777776372366777777677
      Target0   0 TTGTCTGTGCCGAAGTGCAGCTGGCGCAGTCCGGAGTAGAGGTGAAAAAGCCTGGGGAGTCTCTGAGGATCTCCTGTGCG 79   Score
IGHV5-10-1*00 318   gtctgtgccgaagtgcagctggTgcagtccggagCagaggtgaaaaagccCggggagtctctgaggatctcctgtAAg 395  372

                    FR1><CDR1
                   A  S  G  Y  K  F  T  T  Q  R  I  S  W _
      Quality     27677733363733777737777666662556553777777
      Target0  80 GCTTCTGGATACAAGTTTACCACGCAGCGCATCAGCTGGGT 120  Score
IGHV5-10-1*00 396 gGttctggatacaGCtttacca                    417  372

 Quality    33636662676677763252277777775362522227777776636333333777777777333333377777763363
 Target1  0 GCACACTCCTCCAAGAGCACCTCTGGGGGCACAGCGGCCCTGGGCTGACTGGTCAAGGACTACTAACAAGAACCGGTGAA 79   Score
IGHG1*00 32 gcacCctcctccaagagcacctctgggggcacagcAgccctgggctgCctggtcaaggactactTCcCCgaaccggtgaC 111  430
IGHG3*00 32 gcGcCctGctccaGgagcacctctgggggcacagcggccctgggctgCctggtcaaggactactTCcCagaaccggtgaC 111  414

 Quality     3222277777776336663377777636633333333677776763333333376777676333335355
 Target1  80 GGTGACGTGGAACACAGGAGACCTGACCAGAGGCGTGCACACCTTAAAGGAAGTCCTACAGTACTAAGGA 149  Score
IGHG1*00 112 ggtgTcgtggaacTcaggCgCcctgaccagCggcgtgcacaccttCCCggCTgtcctacagtCctCagga 181  430
IGHG3*00 112 ggtgTcgtggaacTcaggCgCcctgaccagCggcgtgcacaccttCCCggCTgtcctacagtCctCagga 181  414

>>> Read ids: 8

                         L2><FR1
                  _ V  C  A  E  V  H  L  A  Q  S  G  A  E  V  K  K  P  G  E  S  L  R  I  S  C  A
      Quality     33667373662222633636733636337762222523352623336673773222226777773523657777373766
      Target0   0 GGTCTGTGCCGAAGTGCATCTGGCTCAGTCCGGAGCAGAGGTGAAAAAGCCTGGGGAGTCTCTGAGGATCTCCTGTGCGG 79   Score
IGHV5-10-1*00 318  gtctgtgccgaagtgcaGctggTGcagtccggagcagaggtgaaaaagccCggggagtctctgaggatctcctgtAAgg 396  361

                   FR1><CDR1               CDR1><FR2
                  A  S  G  Y  K  F  T  T   Q  R  I  S  W  V
      Quality     777777337673337773776766 66766266335777776
      Target0  80 CTTCTGGATACAAGTTTACCACGC-AGCGCATCAGCTGGGTG 120  Score
IGHV5-10-1*00 397 GttctggatacaGCtttacca-gcTaCTgGatcagctgggtg 437  361

 Quality    33533767533677663333377777765363622526677776522333333777777777323333366762766363
 Target1  0 GAAAACTCCTCCAAGAGAAAATCTGGGGGAAAAGGGGCCCTGGGCTGCCTGGACAAGGACTAATAAAAAGAACCGGTGAA 79   Score
IGHG1*00 32 gCaCCctcctccaagagCaCCtctgggggCaCagCAgccctgggctgcctggTcaaggactaCtTCCCCgaaccggtgaC 111  209

 Quality     3332267777676336663667377633736333333776767633336333376777776333335355
 Target1  80 GGAGACGTGGAACACAGGAGACCTGACCAGAGGAGAGCACACCTTAAAGGAAGTCCTACAGACAAAAGGA 149  Score
IGHG1*00 112 ggTgTcgtggaacTcaggCgCcctgaccagCggCgTgcacacctt                          156  209

>>> Read ids: 9

                   <5'UTR                               5'UTR><L1
                                                               M  A  R  F  P  L  L  L  T  L  L
    Quality    44667733373773374644433363333676333363367777436677737332777777777776776363677777
    Target0  0 ATCCTCAGCTGTGGGTAGAGAAGACAGGACTCAGGACAATCTCCAGCATGGCCCGCTTCCCTCTCCTCCTCACCCTCCTC 79   Score
IGLV1-44*00 57     tcagctgtgggtagagaagacaggactcaggacaatctccagcatggccAgcttccctctcctcctcaccctcctc 132  434
IGLV1-47*00 58     tcagctgtgg-tagagaagacaggaTtcaggacaatctccagcatggccGgcttccctctcctcctcaccctcctc 132  392

                    L1>
                 T  H _
    Quality     67773773737366777777327767777757377677676
    Target0  80 ACTCACTGTGCAGGGTCCTGGGCCCATTCTGTGTTGGCTCA 120  Score
IGLV1-44*00 133 actcactgtgcagg                            146  434
IGLV1-47*00 133 actcactgtgcagg                            146  392

 Quality    22553677777755332225253636637633663333666222665222333777737763337765367777773623
 Target1  0 AAAAAAAGGCCACACTGGAGGGGCTCATAAGAGAAGGATACCCGGGAGCCGTGACAGTGGCCAGGAAGGAAGATAGCAGC 79   Score
IGLC2*00 60 CCaaCaaggccacactggTgTgTctcataagTgaCTTCtacccgggagccgtgacagtggccTggaaggCagatagcagc 139  371
IGLC3*00 60 CCaaCaaggccacactggTgTgTctcataagTgaCTTCtacccgggagccgtgacagtggccTggaaggCagatagcagc 139  371

 Quality     33263377772766753637662673773732622366777777666366376376777663363353335
 Target1  80 AACGAAAAGGCGGGAGTGGAGACCACCACAGAAAAAAAACAAAGCAACAACAAGTACGCGGCCAGAAGCTA 150  Score
IGLC2*00 140 CCcgTCaaggcgggagtggagaccaccacaCCCTCCaaacaaagcaacaacaagtacgcggccagCagcta 210  371
IGLC3*00 140 CCcgTCaaggcgggagtggagaccaccacaCCCTCCaaacaaagcaacaacaagtacgcggccagCagcta 210  371

>>> Read ids: 10

               <5'UTR                                   5'UTR><L1
                                                               M  A  S  F  P  L  L  L  T  L  L
    Quality    33366633673733263233333363333666333323366766336773335322677767676776777663677677
    Target0  0 AGCCTCAGCTGTGGGTAGAGAATACAGGACTCAGGACAATCTCCAGCATGGCCAGCTTCCCTCTCCTCCTCACCCTCCTC 79   Score
IGLV1-44*00 53 agcTtcagctgtgggtagagaaGacaggactcaggacaatctccagcatggccagcttccctctcctcctcaccctcctc 132  438

                    L1>
                 T  H _
    Quality     72773575635333266675357762777636367777777
    Target0  80 ACTCACTGTGCAGGGTCCTGGGCCCATTCTGTGTTGGCTCA 120  Score
IGLV1-44*00 133 actcactgtgcagg                            146  438

 Quality    22552677767775333335357777777633363333676666666223333777777756333763337776776633
 Target1  0 AAAACAAGGCCACACTGGAGTGTCTCATAAGTGACAAATACCCGGGAGCCGAGACAGTGGCCTGGAAGGAAGATAGCAGA 79   Score
IGLC2*00 60 CCaacaaggccacactggTgtgtctcataagtgacTTCtacccgggagccgTgacagtggcctggaaggCagatagcagC 139  373
IGLC3*00 60 CCaacaaggccacactggTgtgtctcataagtgacTTCtacccgggagccgTgacagtggcctggaaggCagatagcagC 139  373

 Quality     33333377777666756337367776777732222366777777666376377677777766363353335
 Target1  80 AAAGAAAAGGCGGGAGTGGAGACCACCACAGAAAAAAAACAAAGCAAAAAAAAGTACGCGGCCAGAAGATA 150  Score
IGLC2*00 140 CCCgTCaaggcgggagtggagaccaccacaCCCTCCaaacaaagcaaCaaCaagtacgcggccag       204  373
IGLC3*00 140 CCCgTCaaggcgggagtggagaccaccacaCCCTCCaaacaaagcaaCaaCaagtacgcggccag       204  373

>>> Read ids: 11

               <5'UTR                                   5'UTR><L1
                                                               M  A  S  F  P  L  L  L  T  L  L
    Quality    33667773676766273736333363626777663363677777766777677625777777777777777776777777
    Target0  0 AGCCTCAGCTGTGGGTAGAGAAGACAGGACTCAGGACAATCTCCAGCATGGCCAGCTTCCCTCTCCTCCTCACCCTCCTC 79   Score
IGLV1-44*00 53 agcTtcagctgtgggtagagaagacaggactcaggacaatctccagcatggccagcttccctctcctcctcaccctcctc 132  454
IGLV1-47*00 54 agcTtcagctgtg-gtagagaagacaggaTtcaggacaatctccagcatggccGgcttccctctcctcctcaccctcctc 132  396

                    L1>
                 T  H _
    Quality     77777776777376677777677777777737577777777
    Target0  80 ACTCACTGTGCAGGGTCCTGGGCCCATTCTGTGTTGGCTCA 120  Score
IGLV1-44*00 133 actcactgtgcagg                            146  454
IGLV1-47*00 133 actcactgtgcagg                            146  396

 Quality    33763777767766325236357777767753563332776767776523637777777776333773337777776663
 Target1  0 AAAACAAGGCCACACTGGTGTGTCTCATAAGTGAATACTACCCGGGAGCCGAGACAGTGGCCTGGAAGGCAGATAGCAGA 79   Score
IGLC2*00 60 CCaacaaggccacactggtgtgtctcataagtgaCtTctacccgggagccgTgacagtggcctggaaggcagatagcagC 139  517
IGLC3*00 60 CCaacaaggccacactggtgtgtctcataagtgaCtTctacccgggagccgTgacagtggcctggaaggcagatagcagC 139  517

 Quality     22233377776666767337667777773733333376777777666376377677777763376355335
 Target1  80 AACGTCAAGGCGGGAGTGGAGACCACCACAGACACAAAACAAAGCAACAAAAAGTACGCGGCCAGAAGATA 150  Score
IGLC2*00 140 CCcgtcaaggcgggagtggagaccaccacaCCcTcCaaacaaagcaacaaCaagtacgcggccag       204  517
IGLC3*00 140 CCcgtcaaggcgggagtggagaccaccacaCCcTcCaaacaaagcaacaaCaagtacgcggccag       204  517

>>> Read ids: 12

               <5'UTR                                   5'UTR><L1
                                                               M  A  S  F  P  L  L  L  T  L  L
    Quality    33366733673733374646433363363673333363363777637773666333676677777773777666777777
    Target0  0 AGCCTCAGCTGTGGGTAGAGAAGACAGGACTCAGGACAATCTCCAGCATGGCCAGCTTCCCTCTCCTCCTCACCCTCCTC 79   Score
IGLV1-44*00 53 agcTtcagctgtgggtagagaagacaggactcaggacaatctccagcatggccagcttccctctcctcctcaccctcctc 132  454
IGLV1-47*00 54 agcTtcagctgt-ggtagagaagacaggaTtcaggacaatctccagcatggccGgcttccctctcctcctcaccctcctc 132  396

                    L1>
                 T  H _
    Quality     66776777736363277677677777777736367726776
    Target0  80 ACTCACTGTGCAGGGTCCTGGGCCCATTCTGTGTTGGCTCA 120  Score
IGLV1-44*00 133 actcactgtgcagg                            146  454
IGLV1-47*00 133 actcactgtgcagg                            146  396

 Quality    33753577767756333336353777777633663332262667666223533777777756337765337777773652
 Target1  0 ACAACAAGGCCACACTGGAGTGTCTCATAAGTGAATTCTACCCGGGAGCCGAGACAGTGGCCTGGAAGGAAGATAGCAGA 79   Score
IGLC2*00 60 CcaacaaggccacactggTgtgtctcataagtgaCttctacccgggagccgTgacagtggcctggaaggCagatagcagC 139  531
IGLC3*00 60 CcaacaaggccacactggTgtgtctcataagtgaCttctacccgggagccgTgacagtggcctggaaggCagatagcagC 139  531

 Quality     22233377777777733637667777777733333376777777666376377367736763363353335
 Target1  80 ACCGTCAAGGCGGGAGTGGAGACCACCACAGAAACAAAACAAAGCAAAAAAAAGTACGCGGCCAGCAGCTA 150  Score
IGLC2*00 140 CccgtcaaggcgggagtggagaccaccacaCCCTcCaaacaaagcaaCaaCaagtacgcggccagcagcta 210  531
IGLC3*00 140 CccgtcaaggcgggagtggagaccaccacaCCCTcCaaacaaagcaaCaaCaagtacgcggccagcagcta 210  531

You can reproduce this with the following commands:

mixcr align -f -OallowPartialAlignments=true -OallowNoCDR3PartAlignments=true --species hs --report an2dna.report -p rna-seq -OvParameters.geneFeatureToAlign=VGeneWithP -OvParameters.parameters.floatingLeftBound=true -OjParameters.parameters.floatingRightBound=false -OcParameters.parameters.floatingRightBound=true IB-031218-1_S1_L001_R1_001.TRIM.fastq.gz IB-031218-1_S1_L001_R2_001.fastq.gz an2dna.vdjca

mixcr exportAlignmentsPretty -n 10 an2dna.vdjca

To extract at least something, this command seems to be the best option:

mixcr analyze amplicon -s hs --starting-material dna --5-end no-v-primers --3-end c-primers --adapters adapters-present IB-031218-1_S1_L001_R1_001.TRIM.fastq.gz IB-031218-1_S1_L001_R2_001.fastq.gz an1dna

If you trim all adapters and primers from the data:

--adapters no-adapters

will give you slightly better selectivity (not sure if it will really change anything).

ibseq commented 5 years ago

ok, thanks first of all as finally i can see some real data!

is Read ids: the list of all my reads in that cell?

It’s the first time I’m actually seeing some final data, so am I assuming that in this cell I had: light chain: lambda : V1-44 and V1-47, IGLC2 and 3 heavy chain: IGHV5-10-1

I higlihted in red: is this the amino acid sequence? and why only in some reads?

whats L1 and L2?

IGHG1 and G3 are my IgG isotypes?

finally:

mixcr analyze amplicon -s hs --starting-material dna --5-end no-v-primers --3-end c-primers --adapters adapters-present IB-031218-1_S1_L001_R1_001.TRIM.fastq.gz IB-031218-1_S1_L001_R2_001.fastq.gz an1dna

this is not the one you’ve run right? shall I add to this command --adapters no-adapters ?

thanks a lot irene

On 19 Dec 2018, at 13:15, Dmitry Bolotin notifications@github.com wrote:

After some research, we found that nearly all reads in your dataset lack CDR3

See typical alignments:

Read ids: 2

                                              <5'UTR5'UTR><L1
                                                           I  A  R  F  P  L  L  L  T  L  L
Quality    33337333373733262233333333333373333333373736333373333322562367677773563353577277
Target0  0 TTCCTCAGCTGTTGGTTGATATTTCTTGTCTCTGTACATTCTCCATCATTGCCCGCTTCCCTCTCCTCCTCACCCTCCTC 79   Score

IGLV1-4400 88 acaAtctccaGcatGgccAgcttccctctcctcctcaccctcctc 132 231 IGLV1-4700 88 acaAtctccaGcatGgccGgcttccctctcctcctcaccctcctc 132 231

                L1>
             T  H _
Quality     36773673633333266673326652775636367637773
Target0  80 ACTCACTGTGCAGGGTCCTGGGCCCATTCTGTGTTGACTCA 120  Score

IGLV1-4400 133 actcactgtgcagg 146 231 IGLV1-4700 133 actcactgtgcagg 146 231

Quality 22553573727666333333337533777633563333662526335333337677756636333763337673766633 Target1 0 AAAAAAAGGCCACAAAGGAGAGTCTCATAAGAGAAAAATACCCGGGAGAAGAAACAGTGGCCAGGAAGGAAGATAGCAGA 79 Score IGLC200 60 CCaaCaaggccacaCTggTgTgtctcataagTgaCTTCtacccgggagCCgTGacagtggccTggaaggCagatagcagC 139 198 IGLC300 60 CCaaCaaggccacaCTggTgTgtctcataagTgaCTTCtacccgggagCCgTGacagtggccTggaaggCagatagcagC 139 198

Quality 33333377226666633336367677773736222366777776677676376377633663363353335 Target1 80 AAAGAAAAGGCGGGAGTGGAGACCACCACAGAAAAAAAACAAAGCAAAAAAAAGTACGCGGCAAGAAGAGA 150 Score IGLC200 140 CCCgTCaaggcgggagtggagaccaccacaCCCTCCaaacaaagcaaCaaCaagtacgcggc 201 198 IGLC300 140 CCCgTCaaggcgggagtggagaccaccacaCCCTCCaaacaaagcaaCaaCaagtacgcggc 201 198

Read ids: 3

               <5'UTR                               5'UTR><L1
                                                           M  A  S  F  P  L  L  L  T  L  L
Quality    33367633373733263233333363323667333363366777333673336322777777777776675366777777
Target0  0 ATCCTCAGCTGTGGGTAGAGAAGACAGGACTCAGGACAATCTCCAGCATGGCCAGCTTCCCTCTCCTCCTCACCCTCCTC 79   Score

IGLV1-4400 57 tcagctgtgggtagagaagacaggactcaggacaatctccagcatggccagcttccctctcctcctcaccctcctc 132 450 IGLV1-4700 58 tcagctgtgg-tagagaagacaggaTtcaggacaatctccagcatggccGgcttccctctcctcctcaccctcctc 132 392

                L1>
             T  H _
Quality     67773673737333267573327762777736367667777
Target0  80 ACTCACTGTGCAGGGTCCTGGGCCCATTCTGTGTTGGCTCA 120  Score

IGLV1-4400 133 actcactgtgcagg 146 450 IGLV1-4700 133 actcactgtgcagg 146 392

Quality 22553777666766333333335665777633363333662277766533533677775636335763337776776633 Target1 0 AAAAAAAGGCCACACTGGAGAGTCTCATAAGAGAAAAATACCCGGGAGAAGAGACAGTGGCCAGGAAGGAAGATAGCAGA 79 Score IGLC200 60 CCaaCaaggccacactggTgTgtctcataagTgaCTTCtacccgggagCCgTgacagtggccTggaaggCagatagcagC 139 246 IGLC300 60 CCaaCaaggccacactggTgTgtctcataagTgaCTTCtacccgggagCCgTgacagtggccTggaaggCagatagcagC 139 246

Quality 33333377777766633337662773673732222366767777666376376377767663376353335 Target1 80 AAAGAAAAGGCGGGAGTGGAGACCACCACAGAAAAAAAACAAAGCAAAAAAAAGTACGCGGCAAGAAGATA 150 Score IGLC200 140 CCCgTCaaggcgggagtggagaccaccacaCCCTCCaaacaaagcaaCaaCaagtacgcggc 201 246 IGLC300 140 CCCgTCaaggcgggagtggagaccaccacaCCCTCCaaacaaagcaaCaaCaagtacgcggc 201 246

Read ids: 5

               <5'UTR                               5'UTR><L1
                                                           M  A  S  F  P  L  L  L  T  L  L
Quality    33337633363733363233233333323263333323363766336373336322677777777777576363577777
Target0  0 ATCCTCAGCTGTGTGTTGAGAAGACAGGACTCAGGACAATCTCCAGCATGGCCAGCTTCCCTCTCCTCCTCACCCTCCTC 79   Score

IGLV1-4400 57 tcagctgtgGgtAgagaagacaggactcaggacaatctccagcatggccagcttccctctcctcctcaccctcctc 132 418 IGLV1-4700 58 tcagctgtg-gtAgagaagacaggaTtcaggacaatctccagcatggccGgcttccctctcctcctcaccctcctc 132 376

                L1>
             T  H _
Quality     36776676755333277273356676777736577777777
Target0  80 ACTCACTGTGCAGGGTCCTGGGCCCATTCTGTGTTGGCTCA 120  Score

IGLV1-4400 133 actcactgtgcagg 146 418 IGLV1-4700 133 actcactgtgcagg 146 376

Quality 33553577763366332235236376777633363333676653566523333677776656333763367777776633 Target1 0 AAAACAAGGCCACACTGGAGGGTCTCATAAGAGAAAAATACCCGGGAGACGAGACAGTGGCCAGGAAGGAAGATAGCAGA 79 Score IGLC200 60 CCaacaaggccacactggTgTgtctcataagTgaCTTCtacccgggagCcgTgacagtggccTggaaggCagatagcagC 139 309 IGLC300 60 CCaacaaggccacactggTgTgtctcataagTgaCTTCtacccgggagCcgTgacagtggccTggaaggCagatagcagC 139 309

Quality 33333377655666733337362676373733326366777776677677376377776663363353335 Target1 80 CAAGAAAAGGCGGGAGTGGAGACCACCACAGAGAAAAAACAAAGCAAAAAAAAGTACGCGGCCAGAAGATA 150 Score IGLC200 140 cCCgTCaaggcgggagtggagaccaccacaCCCTCCaaacaaagcaaCaaCaagtacgcggccag 204 309 IGLC300 140 cCCgTCaaggcgggagtggagaccaccacaCCCTCCaaacaaagcaaCaaCaagtacgcggccag 204 309

Read ids: 6

               <5'UTR                               5'UTR><L1
                                                           M  A  S  F  P  L  L  L  T  L  L
Quality    33637633673763273633333363323676333363366777336377637322677777777773776365677777
Target0  0 ATCCTCAGCTGTGGGTAGAGAAGACAGGACTCAGGACAATCTCCAGCATGGCCAGCTTCCCTCTCCTCCTCACCCTCCTC 79   Score

IGLV1-4400 57 tcagctgtgggtagagaagacaggactcaggacaatctccagcatggccagcttccctctcctcctcaccctcctc 132 450 IGLV1-4700 58 tcagctgt-ggtagagaagacaggaTtcaggacaatctccagcatggccGgcttccctctcctcctcaccctcctc 132 392

                L1>
             T  H _
Quality     36773673767333277277677776777737377777776
Target0  80 ACTCACTGTGCAGGGTCCTGGGCCCATTCTGTGTTGACTCA 120  Score

IGLV1-4400 133 actcactgtgcagg 146 450 IGLV1-4700 133 actcactgtgcagg 146 392

Quality 33663777227636332236237767777633563333777777666532337777777736337766337776773622 Target1 0 AAAACAAGGCCACACAGGAGGGTCTCATAAGAGAAGAATACCCGGGAGACGTAACAGTGGCCAGGAAGGAAGATAGCAGC 79 Score IGLC200 60 CCaacaaggccacacTggTgTgtctcataagTgaCTTCtacccgggagCcgtGacagtggccTggaaggCagatagcagc 139 341 IGLC300 60 CCaacaaggccacacTggTgTgtctcataagTgaCTTCtacccgggagCcgtGacagtggccTggaaggCagatagcagc 139 341

Quality 33233377776767763637366777776733222366777777666376377377633663363353335 Target1 80 AACGTAAAGGCGGGAGTGGAGACCACCACAGACAAAAAACAAAGCAACAAAAAGTACGCGGACAGAAGATA 150 Score IGLC200 140 CCcgtCaaggcgggagtggagaccaccacaCCcTCCaaacaaagcaacaaCaagtacgcggCcag 204 341 IGLC300 140 CCcgtCaaggcgggagtggagaccaccacaCCcTCCaaacaaagcaacaaCaagtacgcggCcag 204 341

Read ids: 7

                      L2><FR1
              _  V  C  A  E  V  Q  L  A  Q  S  G  V  E  V  K  K  P  G  E  S  L  R  I  S  C  A
  Quality     33366736367222263733673362622766222263333377736377667622222777776372366777777677
  Target0   0 TTGTCTGTGCCGAAGTGCAGCTGGCGCAGTCCGGAGTAGAGGTGAAAAAGCCTGGGGAGTCTCTGAGGATCTCCTGTGCG 79   Score

IGHV5-10-1*00 318 gtctgtgccgaagtgcagctggTgcagtccggagCagaggtgaaaaagccCggggagtctctgaggatctcctgtAAg 395 372

                FR1><CDR1
               A  S  G  Y  K  F  T  T  Q  R  I  S  W _
  Quality     27677733363733777737777666662556553777777
  Target0  80 GCTTCTGGATACAAGTTTACCACGCAGCGCATCAGCTGGGT 120  Score

IGHV5-10-1*00 396 gGttctggatacaGCtttacca 417 372

Quality 33636662676677763252277777775362522227777776636333333777777777333333377777763363 Target1 0 GCACACTCCTCCAAGAGCACCTCTGGGGGCACAGCGGCCCTGGGCTGACTGGTCAAGGACTACTAACAAGAACCGGTGAA 79 Score IGHG100 32 gcacCctcctccaagagcacctctgggggcacagcAgccctgggctgCctggtcaaggactactTCcCCgaaccggtgaC 111 430 IGHG300 32 gcGcCctGctccaGgagcacctctgggggcacagcggccctgggctgCctggtcaaggactactTCcCagaaccggtgaC 111 414

Quality 3222277777776336663377777636633333333677776763333333376777676333335355 Target1 80 GGTGACGTGGAACACAGGAGACCTGACCAGAGGCGTGCACACCTTAAAGGAAGTCCTACAGTACTAAGGA 149 Score IGHG100 112 ggtgTcgtggaacTcaggCgCcctgaccagCggcgtgcacaccttCCCggCTgtcctacagtCctCagga 181 430 IGHG300 112 ggtgTcgtggaacTcaggCgCcctgaccagCggcgtgcacaccttCCCggCTgtcctacagtCctCagga 181 414

Read ids: 8

                     L2><FR1
              _ V  C  A  E  V  H  L  A  Q  S  G  A  E  V  K  K  P  G  E  S  L  R  I  S  C  A
  Quality     33667373662222633636733636337762222523352623336673773222226777773523657777373766
  Target0   0 GGTCTGTGCCGAAGTGCATCTGGCTCAGTCCGGAGCAGAGGTGAAAAAGCCTGGGGAGTCTCTGAGGATCTCCTGTGCGG 79   Score

IGHV5-10-1*00 318 gtctgtgccgaagtgcaGctggTGcagtccggagcagaggtgaaaaagccCggggagtctctgaggatctcctgtAAgg 396 361

               FR1><CDR1               CDR1><FR2
              A  S  G  Y  K  F  T  T   Q  R  I  S  W  V
  Quality     777777337673337773776766 66766266335777776
  Target0  80 CTTCTGGATACAAGTTTACCACGC-AGCGCATCAGCTGGGTG 120  Score

IGHV5-10-1*00 397 GttctggatacaGCtttacca-gcTaCTgGatcagctgggtg 437 361

Quality 33533767533677663333377777765363622526677776522333333777777777323333366762766363 Target1 0 GAAAACTCCTCCAAGAGAAAATCTGGGGGAAAAGGGGCCCTGGGCTGCCTGGACAAGGACTAATAAAAAGAACCGGTGAA 79 Score IGHG1*00 32 gCaCCctcctccaagagCaCCtctgggggCaCagCAgccctgggctgcctggTcaaggactaCtTCCCCgaaccggtgaC 111 209

Quality 3332267777676336663667377633736333333776767633336333376777776333335355 Target1 80 GGAGACGTGGAACACAGGAGACCTGACCAGAGGAGAGCACACCTTAAAGGAAGTCCTACAGACAAAAGGA 149 Score IGHG1*00 112 ggTgTcgtggaacTcaggCgCcctgaccagCggCgTgcacacctt 156 209

Read ids: 9

               <5'UTR                               5'UTR><L1
                                                           M  A  R  F  P  L  L  L  T  L  L
Quality    44667733373773374644433363333676333363367777436677737332777777777776776363677777
Target0  0 ATCCTCAGCTGTGGGTAGAGAAGACAGGACTCAGGACAATCTCCAGCATGGCCCGCTTCCCTCTCCTCCTCACCCTCCTC 79   Score

IGLV1-4400 57 tcagctgtgggtagagaagacaggactcaggacaatctccagcatggccAgcttccctctcctcctcaccctcctc 132 434 IGLV1-4700 58 tcagctgtgg-tagagaagacaggaTtcaggacaatctccagcatggccGgcttccctctcctcctcaccctcctc 132 392

                L1>
             T  H _
Quality     67773773737366777777327767777757377677676
Target0  80 ACTCACTGTGCAGGGTCCTGGGCCCATTCTGTGTTGGCTCA 120  Score

IGLV1-4400 133 actcactgtgcagg 146 434 IGLV1-4700 133 actcactgtgcagg 146 392

Quality 22553677777755332225253636637633663333666222665222333777737763337765367777773623 Target1 0 AAAAAAAGGCCACACTGGAGGGGCTCATAAGAGAAGGATACCCGGGAGCCGTGACAGTGGCCAGGAAGGAAGATAGCAGC 79 Score IGLC200 60 CCaaCaaggccacactggTgTgTctcataagTgaCTTCtacccgggagccgtgacagtggccTggaaggCagatagcagc 139 371 IGLC300 60 CCaaCaaggccacactggTgTgTctcataagTgaCTTCtacccgggagccgtgacagtggccTggaaggCagatagcagc 139 371

Quality 33263377772766753637662673773732622366777777666366376376777663363353335 Target1 80 AACGAAAAGGCGGGAGTGGAGACCACCACAGAAAAAAAACAAAGCAACAACAAGTACGCGGCCAGAAGCTA 150 Score IGLC200 140 CCcgTCaaggcgggagtggagaccaccacaCCCTCCaaacaaagcaacaacaagtacgcggccagCagcta 210 371 IGLC300 140 CCcgTCaaggcgggagtggagaccaccacaCCCTCCaaacaaagcaacaacaagtacgcggccagCagcta 210 371

Read ids: 10

           <5'UTR                                   5'UTR><L1
                                                           M  A  S  F  P  L  L  L  T  L  L
Quality    33366633673733263233333363333666333323366766336773335322677767676776777663677677
Target0  0 AGCCTCAGCTGTGGGTAGAGAATACAGGACTCAGGACAATCTCCAGCATGGCCAGCTTCCCTCTCCTCCTCACCCTCCTC 79   Score

IGLV1-44*00 53 agcTtcagctgtgggtagagaaGacaggactcaggacaatctccagcatggccagcttccctctcctcctcaccctcctc 132 438

                L1>
             T  H _
Quality     72773575635333266675357762777636367777777
Target0  80 ACTCACTGTGCAGGGTCCTGGGCCCATTCTGTGTTGGCTCA 120  Score

IGLV1-44*00 133 actcactgtgcagg 146 438

Quality 22552677767775333335357777777633363333676666666223333777777756333763337776776633 Target1 0 AAAACAAGGCCACACTGGAGTGTCTCATAAGTGACAAATACCCGGGAGCCGAGACAGTGGCCTGGAAGGAAGATAGCAGA 79 Score IGLC200 60 CCaacaaggccacactggTgtgtctcataagtgacTTCtacccgggagccgTgacagtggcctggaaggCagatagcagC 139 373 IGLC300 60 CCaacaaggccacactggTgtgtctcataagtgacTTCtacccgggagccgTgacagtggcctggaaggCagatagcagC 139 373

Quality 33333377777666756337367776777732222366777777666376377677777766363353335 Target1 80 AAAGAAAAGGCGGGAGTGGAGACCACCACAGAAAAAAAACAAAGCAAAAAAAAGTACGCGGCCAGAAGATA 150 Score IGLC200 140 CCCgTCaaggcgggagtggagaccaccacaCCCTCCaaacaaagcaaCaaCaagtacgcggccag 204 373 IGLC300 140 CCCgTCaaggcgggagtggagaccaccacaCCCTCCaaacaaagcaaCaaCaagtacgcggccag 204 373

Read ids: 11

           <5'UTR                                   5'UTR><L1
                                                           M  A  S  F  P  L  L  L  T  L  L
Quality    33667773676766273736333363626777663363677777766777677625777777777777777776777777
Target0  0 AGCCTCAGCTGTGGGTAGAGAAGACAGGACTCAGGACAATCTCCAGCATGGCCAGCTTCCCTCTCCTCCTCACCCTCCTC 79   Score

IGLV1-4400 53 agcTtcagctgtgggtagagaagacaggactcaggacaatctccagcatggccagcttccctctcctcctcaccctcctc 132 454 IGLV1-4700 54 agcTtcagctgtg-gtagagaagacaggaTtcaggacaatctccagcatggccGgcttccctctcctcctcaccctcctc 132 396

                L1>
             T  H _
Quality     77777776777376677777677777777737577777777
Target0  80 ACTCACTGTGCAGGGTCCTGGGCCCATTCTGTGTTGGCTCA 120  Score

IGLV1-4400 133 actcactgtgcagg 146 454 IGLV1-4700 133 actcactgtgcagg 146 396

Quality 33763777767766325236357777767753563332776767776523637777777776333773337777776663 Target1 0 AAAACAAGGCCACACTGGTGTGTCTCATAAGTGAATACTACCCGGGAGCCGAGACAGTGGCCTGGAAGGCAGATAGCAGA 79 Score IGLC200 60 CCaacaaggccacactggtgtgtctcataagtgaCtTctacccgggagccgTgacagtggcctggaaggcagatagcagC 139 517 IGLC300 60 CCaacaaggccacactggtgtgtctcataagtgaCtTctacccgggagccgTgacagtggcctggaaggcagatagcagC 139 517

Quality 22233377776666767337667777773733333376777777666376377677777763376355335 Target1 80 AACGTCAAGGCGGGAGTGGAGACCACCACAGACACAAAACAAAGCAACAAAAAGTACGCGGCCAGAAGATA 150 Score IGLC200 140 CCcgtcaaggcgggagtggagaccaccacaCCcTcCaaacaaagcaacaaCaagtacgcggccag 204 517 IGLC300 140 CCcgtcaaggcgggagtggagaccaccacaCCcTcCaaacaaagcaacaaCaagtacgcggccag 204 517

Read ids: 12

           <5'UTR                                   5'UTR><L1
                                                           M  A  S  F  P  L  L  L  T  L  L
Quality    33366733673733374646433363363673333363363777637773666333676677777773777666777777
Target0  0 AGCCTCAGCTGTGGGTAGAGAAGACAGGACTCAGGACAATCTCCAGCATGGCCAGCTTCCCTCTCCTCCTCACCCTCCTC 79   Score

IGLV1-4400 53 agcTtcagctgtgggtagagaagacaggactcaggacaatctccagcatggccagcttccctctcctcctcaccctcctc 132 454 IGLV1-4700 54 agcTtcagctgt-ggtagagaagacaggaTtcaggacaatctccagcatggccGgcttccctctcctcctcaccctcctc 132 396

                L1>
             T  H _
Quality     66776777736363277677677777777736367726776
Target0  80 ACTCACTGTGCAGGGTCCTGGGCCCATTCTGTGTTGGCTCA 120  Score

IGLV1-4400 133 actcactgtgcagg 146 454 IGLV1-4700 133 actcactgtgcagg 146 396

Quality 33753577767756333336353777777633663332262667666223533777777756337765337777773652 Target1 0 ACAACAAGGCCACACTGGAGTGTCTCATAAGTGAATTCTACCCGGGAGCCGAGACAGTGGCCTGGAAGGAAGATAGCAGA 79 Score IGLC200 60 CcaacaaggccacactggTgtgtctcataagtgaCttctacccgggagccgTgacagtggcctggaaggCagatagcagC 139 531 IGLC300 60 CcaacaaggccacactggTgtgtctcataagtgaCttctacccgggagccgTgacagtggcctggaaggCagatagcagC 139 531

Quality 22233377777777733637667777777733333376777777666376377367736763363353335 Target1 80 ACCGTCAAGGCGGGAGTGGAGACCACCACAGAAACAAAACAAAGCAAAAAAAAGTACGCGGCCAGCAGCTA 150 Score IGLC200 140 CccgtcaaggcgggagtggagaccaccacaCCCTcCaaacaaagcaaCaaCaagtacgcggccagcagcta 210 531 IGLC300 140 CccgtcaaggcgggagtggagaccaccacaCCCTcCaaacaaagcaaCaaCaagtacgcggccagcagcta 210 531 You can reproduce this with the following commands:

mixcr align -f -OallowPartialAlignments=true -OallowNoCDR3PartAlignments=true --species hs --report an2dna.report -p rna-seq -OvParameters.geneFeatureToAlign=VGeneWithP -OvParameters.parameters.floatingLeftBound=true -OjParameters.parameters.floatingRightBound=false -OcParameters.parameters.floatingRightBound=true IB-031218-1_S1_L001_R1_001.TRIM.fastq.gz IB-031218-1_S1_L001_R2_001.fastq.gz an2dna.vdjca

mixcr exportAlignmentsPretty -n 10 an2dna.vdjca To extract at least something, this command seems to be the best option:

mixcr analyze amplicon -s hs --starting-material dna --5-end no-v-primers --3-end c-primers --adapters adapters-present IB-031218-1_S1_L001_R1_001.TRIM.fastq.gz IB-031218-1_S1_L001_R2_001.fastq.gz an1dna If you trim all adapters and primers from the data:

--adapters no-adapters will give you slightly better selectivity (not sure if it will really change anything).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/milaboratory/mixcr/issues/471#issuecomment-448593203, or mute the thread https://github.com/notifications/unsubscribe-auth/Af_hwzA-ya6KolTDfOvCDo9vRH6nK--mks5u6jwBgaJpZM4ZWGTa.

ibseq commented 5 years ago

oh forgot, why i can’t see the D and J segments for the heavy chain and J for the light?

On 19 Dec 2018, at 13:15, Dmitry Bolotin notifications@github.com wrote:

After some research, we found that nearly all reads in your dataset lack CDR3

See typical alignments:

Read ids: 2

                                              <5'UTR5'UTR><L1
                                                           I  A  R  F  P  L  L  L  T  L  L
Quality    33337333373733262233333333333373333333373736333373333322562367677773563353577277
Target0  0 TTCCTCAGCTGTTGGTTGATATTTCTTGTCTCTGTACATTCTCCATCATTGCCCGCTTCCCTCTCCTCCTCACCCTCCTC 79   Score

IGLV1-4400 88 acaAtctccaGcatGgccAgcttccctctcctcctcaccctcctc 132 231 IGLV1-4700 88 acaAtctccaGcatGgccGgcttccctctcctcctcaccctcctc 132 231

                L1>
             T  H _
Quality     36773673633333266673326652775636367637773
Target0  80 ACTCACTGTGCAGGGTCCTGGGCCCATTCTGTGTTGACTCA 120  Score

IGLV1-4400 133 actcactgtgcagg 146 231 IGLV1-4700 133 actcactgtgcagg 146 231

Quality 22553573727666333333337533777633563333662526335333337677756636333763337673766633 Target1 0 AAAAAAAGGCCACAAAGGAGAGTCTCATAAGAGAAAAATACCCGGGAGAAGAAACAGTGGCCAGGAAGGAAGATAGCAGA 79 Score IGLC200 60 CCaaCaaggccacaCTggTgTgtctcataagTgaCTTCtacccgggagCCgTGacagtggccTggaaggCagatagcagC 139 198 IGLC300 60 CCaaCaaggccacaCTggTgTgtctcataagTgaCTTCtacccgggagCCgTGacagtggccTggaaggCagatagcagC 139 198

Quality 33333377226666633336367677773736222366777776677676376377633663363353335 Target1 80 AAAGAAAAGGCGGGAGTGGAGACCACCACAGAAAAAAAACAAAGCAAAAAAAAGTACGCGGCAAGAAGAGA 150 Score IGLC200 140 CCCgTCaaggcgggagtggagaccaccacaCCCTCCaaacaaagcaaCaaCaagtacgcggc 201 198 IGLC300 140 CCCgTCaaggcgggagtggagaccaccacaCCCTCCaaacaaagcaaCaaCaagtacgcggc 201 198

Read ids: 3

               <5'UTR                               5'UTR><L1
                                                           M  A  S  F  P  L  L  L  T  L  L
Quality    33367633373733263233333363323667333363366777333673336322777777777776675366777777
Target0  0 ATCCTCAGCTGTGGGTAGAGAAGACAGGACTCAGGACAATCTCCAGCATGGCCAGCTTCCCTCTCCTCCTCACCCTCCTC 79   Score

IGLV1-4400 57 tcagctgtgggtagagaagacaggactcaggacaatctccagcatggccagcttccctctcctcctcaccctcctc 132 450 IGLV1-4700 58 tcagctgtgg-tagagaagacaggaTtcaggacaatctccagcatggccGgcttccctctcctcctcaccctcctc 132 392

                L1>
             T  H _
Quality     67773673737333267573327762777736367667777
Target0  80 ACTCACTGTGCAGGGTCCTGGGCCCATTCTGTGTTGGCTCA 120  Score

IGLV1-4400 133 actcactgtgcagg 146 450 IGLV1-4700 133 actcactgtgcagg 146 392

Quality 22553777666766333333335665777633363333662277766533533677775636335763337776776633 Target1 0 AAAAAAAGGCCACACTGGAGAGTCTCATAAGAGAAAAATACCCGGGAGAAGAGACAGTGGCCAGGAAGGAAGATAGCAGA 79 Score IGLC200 60 CCaaCaaggccacactggTgTgtctcataagTgaCTTCtacccgggagCCgTgacagtggccTggaaggCagatagcagC 139 246 IGLC300 60 CCaaCaaggccacactggTgTgtctcataagTgaCTTCtacccgggagCCgTgacagtggccTggaaggCagatagcagC 139 246

Quality 33333377777766633337662773673732222366767777666376376377767663376353335 Target1 80 AAAGAAAAGGCGGGAGTGGAGACCACCACAGAAAAAAAACAAAGCAAAAAAAAGTACGCGGCAAGAAGATA 150 Score IGLC200 140 CCCgTCaaggcgggagtggagaccaccacaCCCTCCaaacaaagcaaCaaCaagtacgcggc 201 246 IGLC300 140 CCCgTCaaggcgggagtggagaccaccacaCCCTCCaaacaaagcaaCaaCaagtacgcggc 201 246

Read ids: 5

               <5'UTR                               5'UTR><L1
                                                           M  A  S  F  P  L  L  L  T  L  L
Quality    33337633363733363233233333323263333323363766336373336322677777777777576363577777
Target0  0 ATCCTCAGCTGTGTGTTGAGAAGACAGGACTCAGGACAATCTCCAGCATGGCCAGCTTCCCTCTCCTCCTCACCCTCCTC 79   Score

IGLV1-4400 57 tcagctgtgGgtAgagaagacaggactcaggacaatctccagcatggccagcttccctctcctcctcaccctcctc 132 418 IGLV1-4700 58 tcagctgtg-gtAgagaagacaggaTtcaggacaatctccagcatggccGgcttccctctcctcctcaccctcctc 132 376

                L1>
             T  H _
Quality     36776676755333277273356676777736577777777
Target0  80 ACTCACTGTGCAGGGTCCTGGGCCCATTCTGTGTTGGCTCA 120  Score

IGLV1-4400 133 actcactgtgcagg 146 418 IGLV1-4700 133 actcactgtgcagg 146 376

Quality 33553577763366332235236376777633363333676653566523333677776656333763367777776633 Target1 0 AAAACAAGGCCACACTGGAGGGTCTCATAAGAGAAAAATACCCGGGAGACGAGACAGTGGCCAGGAAGGAAGATAGCAGA 79 Score IGLC200 60 CCaacaaggccacactggTgTgtctcataagTgaCTTCtacccgggagCcgTgacagtggccTggaaggCagatagcagC 139 309 IGLC300 60 CCaacaaggccacactggTgTgtctcataagTgaCTTCtacccgggagCcgTgacagtggccTggaaggCagatagcagC 139 309

Quality 33333377655666733337362676373733326366777776677677376377776663363353335 Target1 80 CAAGAAAAGGCGGGAGTGGAGACCACCACAGAGAAAAAACAAAGCAAAAAAAAGTACGCGGCCAGAAGATA 150 Score IGLC200 140 cCCgTCaaggcgggagtggagaccaccacaCCCTCCaaacaaagcaaCaaCaagtacgcggccag 204 309 IGLC300 140 cCCgTCaaggcgggagtggagaccaccacaCCCTCCaaacaaagcaaCaaCaagtacgcggccag 204 309

Read ids: 6

               <5'UTR                               5'UTR><L1
                                                           M  A  S  F  P  L  L  L  T  L  L
Quality    33637633673763273633333363323676333363366777336377637322677777777773776365677777
Target0  0 ATCCTCAGCTGTGGGTAGAGAAGACAGGACTCAGGACAATCTCCAGCATGGCCAGCTTCCCTCTCCTCCTCACCCTCCTC 79   Score

IGLV1-4400 57 tcagctgtgggtagagaagacaggactcaggacaatctccagcatggccagcttccctctcctcctcaccctcctc 132 450 IGLV1-4700 58 tcagctgt-ggtagagaagacaggaTtcaggacaatctccagcatggccGgcttccctctcctcctcaccctcctc 132 392

                L1>
             T  H _
Quality     36773673767333277277677776777737377777776
Target0  80 ACTCACTGTGCAGGGTCCTGGGCCCATTCTGTGTTGACTCA 120  Score

IGLV1-4400 133 actcactgtgcagg 146 450 IGLV1-4700 133 actcactgtgcagg 146 392

Quality 33663777227636332236237767777633563333777777666532337777777736337766337776773622 Target1 0 AAAACAAGGCCACACAGGAGGGTCTCATAAGAGAAGAATACCCGGGAGACGTAACAGTGGCCAGGAAGGAAGATAGCAGC 79 Score IGLC200 60 CCaacaaggccacacTggTgTgtctcataagTgaCTTCtacccgggagCcgtGacagtggccTggaaggCagatagcagc 139 341 IGLC300 60 CCaacaaggccacacTggTgTgtctcataagTgaCTTCtacccgggagCcgtGacagtggccTggaaggCagatagcagc 139 341

Quality 33233377776767763637366777776733222366777777666376377377633663363353335 Target1 80 AACGTAAAGGCGGGAGTGGAGACCACCACAGACAAAAAACAAAGCAACAAAAAGTACGCGGACAGAAGATA 150 Score IGLC200 140 CCcgtCaaggcgggagtggagaccaccacaCCcTCCaaacaaagcaacaaCaagtacgcggCcag 204 341 IGLC300 140 CCcgtCaaggcgggagtggagaccaccacaCCcTCCaaacaaagcaacaaCaagtacgcggCcag 204 341

Read ids: 7

                      L2><FR1
              _  V  C  A  E  V  Q  L  A  Q  S  G  V  E  V  K  K  P  G  E  S  L  R  I  S  C  A
  Quality     33366736367222263733673362622766222263333377736377667622222777776372366777777677
  Target0   0 TTGTCTGTGCCGAAGTGCAGCTGGCGCAGTCCGGAGTAGAGGTGAAAAAGCCTGGGGAGTCTCTGAGGATCTCCTGTGCG 79   Score

IGHV5-10-1*00 318 gtctgtgccgaagtgcagctggTgcagtccggagCagaggtgaaaaagccCggggagtctctgaggatctcctgtAAg 395 372

                FR1><CDR1
               A  S  G  Y  K  F  T  T  Q  R  I  S  W _
  Quality     27677733363733777737777666662556553777777
  Target0  80 GCTTCTGGATACAAGTTTACCACGCAGCGCATCAGCTGGGT 120  Score

IGHV5-10-1*00 396 gGttctggatacaGCtttacca 417 372

Quality 33636662676677763252277777775362522227777776636333333777777777333333377777763363 Target1 0 GCACACTCCTCCAAGAGCACCTCTGGGGGCACAGCGGCCCTGGGCTGACTGGTCAAGGACTACTAACAAGAACCGGTGAA 79 Score IGHG100 32 gcacCctcctccaagagcacctctgggggcacagcAgccctgggctgCctggtcaaggactactTCcCCgaaccggtgaC 111 430 IGHG300 32 gcGcCctGctccaGgagcacctctgggggcacagcggccctgggctgCctggtcaaggactactTCcCagaaccggtgaC 111 414

Quality 3222277777776336663377777636633333333677776763333333376777676333335355 Target1 80 GGTGACGTGGAACACAGGAGACCTGACCAGAGGCGTGCACACCTTAAAGGAAGTCCTACAGTACTAAGGA 149 Score IGHG100 112 ggtgTcgtggaacTcaggCgCcctgaccagCggcgtgcacaccttCCCggCTgtcctacagtCctCagga 181 430 IGHG300 112 ggtgTcgtggaacTcaggCgCcctgaccagCggcgtgcacaccttCCCggCTgtcctacagtCctCagga 181 414

Read ids: 8

                     L2><FR1
              _ V  C  A  E  V  H  L  A  Q  S  G  A  E  V  K  K  P  G  E  S  L  R  I  S  C  A
  Quality     33667373662222633636733636337762222523352623336673773222226777773523657777373766
  Target0   0 GGTCTGTGCCGAAGTGCATCTGGCTCAGTCCGGAGCAGAGGTGAAAAAGCCTGGGGAGTCTCTGAGGATCTCCTGTGCGG 79   Score

IGHV5-10-1*00 318 gtctgtgccgaagtgcaGctggTGcagtccggagcagaggtgaaaaagccCggggagtctctgaggatctcctgtAAgg 396 361

               FR1><CDR1               CDR1><FR2
              A  S  G  Y  K  F  T  T   Q  R  I  S  W  V
  Quality     777777337673337773776766 66766266335777776
  Target0  80 CTTCTGGATACAAGTTTACCACGC-AGCGCATCAGCTGGGTG 120  Score

IGHV5-10-1*00 397 GttctggatacaGCtttacca-gcTaCTgGatcagctgggtg 437 361

Quality 33533767533677663333377777765363622526677776522333333777777777323333366762766363 Target1 0 GAAAACTCCTCCAAGAGAAAATCTGGGGGAAAAGGGGCCCTGGGCTGCCTGGACAAGGACTAATAAAAAGAACCGGTGAA 79 Score IGHG1*00 32 gCaCCctcctccaagagCaCCtctgggggCaCagCAgccctgggctgcctggTcaaggactaCtTCCCCgaaccggtgaC 111 209

Quality 3332267777676336663667377633736333333776767633336333376777776333335355 Target1 80 GGAGACGTGGAACACAGGAGACCTGACCAGAGGAGAGCACACCTTAAAGGAAGTCCTACAGACAAAAGGA 149 Score IGHG1*00 112 ggTgTcgtggaacTcaggCgCcctgaccagCggCgTgcacacctt 156 209

Read ids: 9

               <5'UTR                               5'UTR><L1
                                                           M  A  R  F  P  L  L  L  T  L  L
Quality    44667733373773374644433363333676333363367777436677737332777777777776776363677777
Target0  0 ATCCTCAGCTGTGGGTAGAGAAGACAGGACTCAGGACAATCTCCAGCATGGCCCGCTTCCCTCTCCTCCTCACCCTCCTC 79   Score

IGLV1-4400 57 tcagctgtgggtagagaagacaggactcaggacaatctccagcatggccAgcttccctctcctcctcaccctcctc 132 434 IGLV1-4700 58 tcagctgtgg-tagagaagacaggaTtcaggacaatctccagcatggccGgcttccctctcctcctcaccctcctc 132 392

                L1>
             T  H _
Quality     67773773737366777777327767777757377677676
Target0  80 ACTCACTGTGCAGGGTCCTGGGCCCATTCTGTGTTGGCTCA 120  Score

IGLV1-4400 133 actcactgtgcagg 146 434 IGLV1-4700 133 actcactgtgcagg 146 392

Quality 22553677777755332225253636637633663333666222665222333777737763337765367777773623 Target1 0 AAAAAAAGGCCACACTGGAGGGGCTCATAAGAGAAGGATACCCGGGAGCCGTGACAGTGGCCAGGAAGGAAGATAGCAGC 79 Score IGLC200 60 CCaaCaaggccacactggTgTgTctcataagTgaCTTCtacccgggagccgtgacagtggccTggaaggCagatagcagc 139 371 IGLC300 60 CCaaCaaggccacactggTgTgTctcataagTgaCTTCtacccgggagccgtgacagtggccTggaaggCagatagcagc 139 371

Quality 33263377772766753637662673773732622366777777666366376376777663363353335 Target1 80 AACGAAAAGGCGGGAGTGGAGACCACCACAGAAAAAAAACAAAGCAACAACAAGTACGCGGCCAGAAGCTA 150 Score IGLC200 140 CCcgTCaaggcgggagtggagaccaccacaCCCTCCaaacaaagcaacaacaagtacgcggccagCagcta 210 371 IGLC300 140 CCcgTCaaggcgggagtggagaccaccacaCCCTCCaaacaaagcaacaacaagtacgcggccagCagcta 210 371

Read ids: 10

           <5'UTR                                   5'UTR><L1
                                                           M  A  S  F  P  L  L  L  T  L  L
Quality    33366633673733263233333363333666333323366766336773335322677767676776777663677677
Target0  0 AGCCTCAGCTGTGGGTAGAGAATACAGGACTCAGGACAATCTCCAGCATGGCCAGCTTCCCTCTCCTCCTCACCCTCCTC 79   Score

IGLV1-44*00 53 agcTtcagctgtgggtagagaaGacaggactcaggacaatctccagcatggccagcttccctctcctcctcaccctcctc 132 438

                L1>
             T  H _
Quality     72773575635333266675357762777636367777777
Target0  80 ACTCACTGTGCAGGGTCCTGGGCCCATTCTGTGTTGGCTCA 120  Score

IGLV1-44*00 133 actcactgtgcagg 146 438

Quality 22552677767775333335357777777633363333676666666223333777777756333763337776776633 Target1 0 AAAACAAGGCCACACTGGAGTGTCTCATAAGTGACAAATACCCGGGAGCCGAGACAGTGGCCTGGAAGGAAGATAGCAGA 79 Score IGLC200 60 CCaacaaggccacactggTgtgtctcataagtgacTTCtacccgggagccgTgacagtggcctggaaggCagatagcagC 139 373 IGLC300 60 CCaacaaggccacactggTgtgtctcataagtgacTTCtacccgggagccgTgacagtggcctggaaggCagatagcagC 139 373

Quality 33333377777666756337367776777732222366777777666376377677777766363353335 Target1 80 AAAGAAAAGGCGGGAGTGGAGACCACCACAGAAAAAAAACAAAGCAAAAAAAAGTACGCGGCCAGAAGATA 150 Score IGLC200 140 CCCgTCaaggcgggagtggagaccaccacaCCCTCCaaacaaagcaaCaaCaagtacgcggccag 204 373 IGLC300 140 CCCgTCaaggcgggagtggagaccaccacaCCCTCCaaacaaagcaaCaaCaagtacgcggccag 204 373

Read ids: 11

           <5'UTR                                   5'UTR><L1
                                                           M  A  S  F  P  L  L  L  T  L  L
Quality    33667773676766273736333363626777663363677777766777677625777777777777777776777777
Target0  0 AGCCTCAGCTGTGGGTAGAGAAGACAGGACTCAGGACAATCTCCAGCATGGCCAGCTTCCCTCTCCTCCTCACCCTCCTC 79   Score

IGLV1-4400 53 agcTtcagctgtgggtagagaagacaggactcaggacaatctccagcatggccagcttccctctcctcctcaccctcctc 132 454 IGLV1-4700 54 agcTtcagctgtg-gtagagaagacaggaTtcaggacaatctccagcatggccGgcttccctctcctcctcaccctcctc 132 396

                L1>
             T  H _
Quality     77777776777376677777677777777737577777777
Target0  80 ACTCACTGTGCAGGGTCCTGGGCCCATTCTGTGTTGGCTCA 120  Score

IGLV1-4400 133 actcactgtgcagg 146 454 IGLV1-4700 133 actcactgtgcagg 146 396

Quality 33763777767766325236357777767753563332776767776523637777777776333773337777776663 Target1 0 AAAACAAGGCCACACTGGTGTGTCTCATAAGTGAATACTACCCGGGAGCCGAGACAGTGGCCTGGAAGGCAGATAGCAGA 79 Score IGLC200 60 CCaacaaggccacactggtgtgtctcataagtgaCtTctacccgggagccgTgacagtggcctggaaggcagatagcagC 139 517 IGLC300 60 CCaacaaggccacactggtgtgtctcataagtgaCtTctacccgggagccgTgacagtggcctggaaggcagatagcagC 139 517

Quality 22233377776666767337667777773733333376777777666376377677777763376355335 Target1 80 AACGTCAAGGCGGGAGTGGAGACCACCACAGACACAAAACAAAGCAACAAAAAGTACGCGGCCAGAAGATA 150 Score IGLC200 140 CCcgtcaaggcgggagtggagaccaccacaCCcTcCaaacaaagcaacaaCaagtacgcggccag 204 517 IGLC300 140 CCcgtcaaggcgggagtggagaccaccacaCCcTcCaaacaaagcaacaaCaagtacgcggccag 204 517

Read ids: 12

           <5'UTR                                   5'UTR><L1
                                                           M  A  S  F  P  L  L  L  T  L  L
Quality    33366733673733374646433363363673333363363777637773666333676677777773777666777777
Target0  0 AGCCTCAGCTGTGGGTAGAGAAGACAGGACTCAGGACAATCTCCAGCATGGCCAGCTTCCCTCTCCTCCTCACCCTCCTC 79   Score

IGLV1-4400 53 agcTtcagctgtgggtagagaagacaggactcaggacaatctccagcatggccagcttccctctcctcctcaccctcctc 132 454 IGLV1-4700 54 agcTtcagctgt-ggtagagaagacaggaTtcaggacaatctccagcatggccGgcttccctctcctcctcaccctcctc 132 396

                L1>
             T  H _
Quality     66776777736363277677677777777736367726776
Target0  80 ACTCACTGTGCAGGGTCCTGGGCCCATTCTGTGTTGGCTCA 120  Score

IGLV1-4400 133 actcactgtgcagg 146 454 IGLV1-4700 133 actcactgtgcagg 146 396

Quality 33753577767756333336353777777633663332262667666223533777777756337765337777773652 Target1 0 ACAACAAGGCCACACTGGAGTGTCTCATAAGTGAATTCTACCCGGGAGCCGAGACAGTGGCCTGGAAGGAAGATAGCAGA 79 Score IGLC200 60 CcaacaaggccacactggTgtgtctcataagtgaCttctacccgggagccgTgacagtggcctggaaggCagatagcagC 139 531 IGLC300 60 CcaacaaggccacactggTgtgtctcataagtgaCttctacccgggagccgTgacagtggcctggaaggCagatagcagC 139 531

Quality 22233377777777733637667777777733333376777777666376377367736763363353335 Target1 80 ACCGTCAAGGCGGGAGTGGAGACCACCACAGAAACAAAACAAAGCAAAAAAAAGTACGCGGCCAGCAGCTA 150 Score IGLC200 140 CccgtcaaggcgggagtggagaccaccacaCCCTcCaaacaaagcaaCaaCaagtacgcggccagcagcta 210 531 IGLC300 140 CccgtcaaggcgggagtggagaccaccacaCCCTcCaaacaaagcaaCaaCaagtacgcggccagcagcta 210 531 You can reproduce this with the following commands:

mixcr align -f -OallowPartialAlignments=true -OallowNoCDR3PartAlignments=true --species hs --report an2dna.report -p rna-seq -OvParameters.geneFeatureToAlign=VGeneWithP -OvParameters.parameters.floatingLeftBound=true -OjParameters.parameters.floatingRightBound=false -OcParameters.parameters.floatingRightBound=true IB-031218-1_S1_L001_R1_001.TRIM.fastq.gz IB-031218-1_S1_L001_R2_001.fastq.gz an2dna.vdjca

mixcr exportAlignmentsPretty -n 10 an2dna.vdjca To extract at least something, this command seems to be the best option:

mixcr analyze amplicon -s hs --starting-material dna --5-end no-v-primers --3-end c-primers --adapters adapters-present IB-031218-1_S1_L001_R1_001.TRIM.fastq.gz IB-031218-1_S1_L001_R2_001.fastq.gz an1dna If you trim all adapters and primers from the data:

--adapters no-adapters will give you slightly better selectivity (not sure if it will really change anything).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/milaboratory/mixcr/issues/471#issuecomment-448593203, or mute the thread https://github.com/notifications/unsubscribe-auth/Af_hwzA-ya6KolTDfOvCDo9vRH6nK--mks5u6jwBgaJpZM4ZWGTa.

dbolotin commented 5 years ago

Hi Irene,

The data you send does not contain CDR3 (V-D-J junction) sequence, in nearly all sequencing reads. I advise you to hand-analyse 5 sequences from your fastq files, to reconstruct the actual library structure and compare it to what you expected.

MiXCR as a tool for a massive repertoire profiling, aimed at reconstruction of TCR/BCR repertoires from the the datasets containing thousands of different clonotypes. To achieve this, MiXCR relies on the CDR3, as a most diverse part of the IG/TCR sequence, to distinguish between sequences coming from different clonotypes, to carefully assemble clonotypes, their counts, etc.. Because nearly all (except 4-7 reads) of your reads do not cover CDR3, MiXCR cant address them to a clonotype, and by default drops such reads.

Still MiXCR correctly identify CDR3s in those few reads and correctly assemble clonotypes from them.

So, the problem seems to be in the data itself. Anyway you need CDR3 as a main identifier of IG sequences, and protocol must be changed to provide better coverage for the region. If you need full IG sequence it seems that increased sequencing length may help. All this advises are from the pure analytical standpoint, I am not very familiar with the wet lab limitations of the protocol you use, and the sample preparation part as a whole, so I can't say anything more detailed than this here.

I am closing the issue as it has nothing to do with the software.

ibseq commented 5 years ago

that makes sense, thanks very much

this was only one cell: if I had analysed (as it should be) thousands of cells, I guess this would have changed?

thanks very much for the massive help. I will run mixcr on a published dataset to see what the output is supposed to look like.

BW Irene

On 19 Dec 2018, at 15:16, Dmitry Bolotin notifications@github.com wrote:

Hi Irene,

The data you send does not contain CDR3 (V-D-J junction) sequence, in nearly all sequencing reads. I advise you to hand-analyse 5 sequences from your fastq files, to reconstruct the actual library structure and compare it to what you expected.

MiXCR as a tool for a massive repertoire profiling, aimed at reconstruction of TCR/BCR repertoires from the the datasets containing thousands of different clonotypes. To achieve this, MiXCR relies on the CDR3, as a most diverse part of the IG/TCR sequence, to distinguish between sequences coming from different clonotypes, to carefully assemble clonotypes, their counts, etc.. Because nearly all (except 4-7 reads) of your reads do not cover CDR3, MiXCR cant address them to a clonotype, and by default drops such reads.

Still MiXCR correctly identify CDR3s in those few reads and correctly assemble clonotypes from them.

So, the problem seems to be in the data itself. Anyway you need CDR3 as a main identifier of IG sequences, and protocol must be changed to provide better coverage for the region. If you need full IG sequence it seems that increased sequencing length may help. All this advises are from the pure analytical standpoint, I am not very familiar with the wet lab limitations of the protocol you use, and the sample preparation part as a whole, so I can't say anything more detailed than this here.

I am closing the issue as it has nothing to do with the software.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/milaboratory/mixcr/issues/471#issuecomment-448630970, or mute the thread https://github.com/notifications/unsubscribe-auth/Af_hw5CthPVYkSGco5jWWRuKxgD-njoqks5u6lhkgaJpZM4ZWGTa.

ibseq commented 5 years ago

Hi Dmitry, I re-run and analysed the full set and I can now see the whole of the results:

  1. how come under column “target” there are two sequences separate by a comma?
  2. is “target” my actual sequence
  3. how do I know which V is pared with J or D
  4. why not all the sequences have a J or D segment
  5. is each “target sequence” a read ?

thanks very much for the help irene

On 19 Dec 2018, at 15:16, Dmitry Bolotin notifications@github.com wrote:

Hi Irene,

The data you send does not contain CDR3 (V-D-J junction) sequence, in nearly all sequencing reads. I advise you to hand-analyse 5 sequences from your fastq files, to reconstruct the actual library structure and compare it to what you expected.

MiXCR as a tool for a massive repertoire profiling, aimed at reconstruction of TCR/BCR repertoires from the the datasets containing thousands of different clonotypes. To achieve this, MiXCR relies on the CDR3, as a most diverse part of the IG/TCR sequence, to distinguish between sequences coming from different clonotypes, to carefully assemble clonotypes, their counts, etc.. Because nearly all (except 4-7 reads) of your reads do not cover CDR3, MiXCR cant address them to a clonotype, and by default drops such reads.

Still MiXCR correctly identify CDR3s in those few reads and correctly assemble clonotypes from them.

So, the problem seems to be in the data itself. Anyway you need CDR3 as a main identifier of IG sequences, and protocol must be changed to provide better coverage for the region. If you need full IG sequence it seems that increased sequencing length may help. All this advises are from the pure analytical standpoint, I am not very familiar with the wet lab limitations of the protocol you use, and the sample preparation part as a whole, so I can't say anything more detailed than this here.

I am closing the issue as it has nothing to do with the software.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/milaboratory/mixcr/issues/471#issuecomment-448630970, or mute the thread https://github.com/notifications/unsubscribe-auth/Af_hw5CthPVYkSGco5jWWRuKxgD-njoqks5u6lhkgaJpZM4ZWGTa.

ibseq commented 5 years ago

Hi dimity just looking back at the data and to me it seems there is the cdr3: i attach the my_analysis.vdjca file

what is then the list of cdr3 under CDR3 column?

thanks irene

On 19 Dec 2018, at 15:16, Dmitry Bolotin notifications@github.com wrote:

Hi Irene,

The data you send does not contain CDR3 (V-D-J junction) sequence, in nearly all sequencing reads. I advise you to hand-analyse 5 sequences from your fastq files, to reconstruct the actual library structure and compare it to what you expected.

MiXCR as a tool for a massive repertoire profiling, aimed at reconstruction of TCR/BCR repertoires from the the datasets containing thousands of different clonotypes. To achieve this, MiXCR relies on the CDR3, as a most diverse part of the IG/TCR sequence, to distinguish between sequences coming from different clonotypes, to carefully assemble clonotypes, their counts, etc.. Because nearly all (except 4-7 reads) of your reads do not cover CDR3, MiXCR cant address them to a clonotype, and by default drops such reads.

Still MiXCR correctly identify CDR3s in those few reads and correctly assemble clonotypes from them.

So, the problem seems to be in the data itself. Anyway you need CDR3 as a main identifier of IG sequences, and protocol must be changed to provide better coverage for the region. If you need full IG sequence it seems that increased sequencing length may help. All this advises are from the pure analytical standpoint, I am not very familiar with the wet lab limitations of the protocol you use, and the sample preparation part as a whole, so I can't say anything more detailed than this here.

I am closing the issue as it has nothing to do with the software.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/milaboratory/mixcr/issues/471#issuecomment-448630970, or mute the thread https://github.com/notifications/unsubscribe-auth/Af_hw5CthPVYkSGco5jWWRuKxgD-njoqks5u6lhkgaJpZM4ZWGTa.

targetSequences targetQualities allVHitsWithScore allDHitsWithScore allJHitsWithScore allCHitsWithScore allVAlignments allDAlignments allJAlignments allCAlignments nSeqFR1 minQualFR1 nSeqCDR1 minQualCDR1 nSeqFR2 minQualFR2 nSeqCDR2 minQualCDR2 nSeqFR3 minQualFR3 nSeqCDR3 minQualCDR3 nSeqFR4 minQualFR4 aaSeqFR1 aaSeqCDR1 aaSeqFR2 aaSeqCDR2 aaSeqFR3 aaSeqCDR3 aaSeqFR4 refPoints GGGTCTGTGCCGAAGTGCAGCTGGCGCAGTCCGGAGCAGAGGTGAAAAAGCCTGGGGAGTCTCTGAGGATCTCCTGTGCGGCTTCTGGATACAAGTTTACCACGCAGCGCATCAGCTGGGT,CCTCGGACACCGCCATCTACTTTTGTGCGAGACATAGACTTGGACGAATGTTTGACCTCGATGGACACTTCGATCTCTGGGGCCCTGGCACCCTGGTCATTGTCTCCCCAGCCTCCAAACAAAGCAACAACAAGTACGCGGCCAGCAGCTA FGGGGHHHHHHGGGGGHHHHHGHHGGGGGGHHGGGGGHHHHGHHHFGHHDHHHHHGGGGGHHHHHGHGGHHHHHHHHHHGGGGGGGGGGGGGGGGGGGGGGGGGGFFFFFFFFFFFFFFFF,GGGGHHGGFGGHGHHHGHHHHHGGGCGHHGHHGHHHHHHHGGGHHHHHHFFFGHGGGEGHHGHHHHGGHGGHHGGFEGEFEEGGBHHGGHGFGGFHFFHGHFGFEGGECGE2GEBBFHHHHHHHHHHGHGGGGGGGFCE4A4CFFFBBBAB IGHV5-10-100(1047) IGHD3-900(36) IGHJ200(363) 318|437|643|2|121|ST340CSC368TSA393GSA394CSG397CSG409ASC410GSG418CSC419GST420CSC422GST423CSG425C|813.0,589|624|643|0|35|SG605CST608CSA610TSC611T|234.0 ,36|49|93|43|56|DT40I43G|36.0 ,28|73|73|66|111|SG46CSC61TST69C|363.0 GAAGTGCAGCTGGCGCAGTCCGGAGCAGAGGTGAAAAAGCCTGGGGAGTCTCTGAGGATCTCCTGTGCGGCTTCT 35 GGATACAAGTTTACCACGCAGCGC 37 TGTGCGAGACATAGACTTGGACGAATGTTTGACCTCGATGGACACTTCGATCTCTGG 34 GGCCCTGGCACCCTGGTCATTGTCTCCCCAG 33 EVQLAQSGAEVKKPGESLRISCAAS GYKFTTQR CARHRLGRMFDLDGHFDLW GPGTLVIVSP_ ::::11:86:110:::::::::::::::,:::::::::23:1:35:43:-5:-13:56:66:-8:80:111:: AGTCTCAGAGAGGGGCCTTAAACATGGACTCCAAGGCATTTTCAATTGGTGCCAAGCACTCTCTACTGCTCACTCACCATGCTCATTGGGCTGAGCTGGGTTCTCATTGTTTCTATTTTAG,TTACTGTGCGAGAGGCCGCCTCCTAAAGCCGTCCCCTTTAGACGGGTGGGGCGAGGGAGCCCTGGTCACCGTCTCGTCAGGGAGTGCAGGCGCCACAGCCGTTTTCCCCAGCGTCTCCTGTGAGAATTCCCCGTCGGATACGAGCAGCGTA 2FAFFFHH3322B2A20133553533B333ABG3F1B13B55D55B533@?3B53112B333334B43343BF4F3BG333333444B??G2??G3332?3/02B3BD3B32?F33?3B32,;0;9---A./;..----../:000/----CA>.>1<1<0/?</C/<<//>/0B//?0//EFFFB//?/?///?>0010/0C1F0>/?//>/000/>///0>///ACE?EEFA/A0/111A2HGFB//00000A131111A@11111>>1 IGHV1-800(170),IGHV4-3400(170) IGHD6-600(28),IGHD3-300(26),IGHD2-1500(25) IGHJ400(253),IGHJ500(253) IGHM00(243) ,524|541|559|0|17||170.0;,518|535|553|0|17||170.0 ,27|36|54|26|34|DT30|28.0;,13|21|93|19|27|SA17T|26.0;,27|32|93|20|25||25.0 ,34|68|68|46|80|SC40GSA46GSC63G|253.0;,37|71|71|46|80|SC43GSA49GSC66G|253.0 ,0|71|312|80|151|ST8GSC9GSC14ASA17GSC20GSC29AST30GSG70A|243.0 TGTGCGAGAGGCCGCCTCCTAAAGCCGTCCCCTTTAGACGGGTGG 12 GGCGAGGGAGCCCTGGTCACCGTCTCGTCAG 14 CARGRLLKPSPLDGW GEGALVTVSS_ :::::::::::::::::::::,:::::::::4:2:17:26:-9:0:34:46:-14:49:80:80: AGTCTCAGATGTGGGCCGTGAAGCTGGACTCCAGGACATTTTCCCGTAGGGCCAGCCACTCTGTCCACCTCACTCACCTTGGTCATTGTGCAGGGCTGTGGTTTCCTTGTTTCTATTTTAG,TTACTGTGCGAGAGGCCGCCTCCTAATGCAGTCACCTCTTGACGGTTGGCGCCAGGGAACCCTGGACACCGTATACTCAGGGAGTGCATCCGCCACAACCCTCTTCCCCAGCGTCTACTGTGAGAATTCCCAGTCGGATACGAGCAGCGTG 2F2BFGHH355AD3E22200115333B1133BF3F1B23D55D3311011B2B11111B333444B43111BG4F3FH333333444B4BG4F0E/0/0B22/0B3BG3F222@2222@22,00=/---C.C:..---..0?111<<1<11F?/1F0@1?//?/F//>//B/1FB1B</GHGFB//B/B221B10B10/0CGGF0/>///B////AB//00///EA/EE/E1GDAA11122HGFBA00AA00A131B1>1@111111>1 IGHV1-800(170),IGHV4-3400(170) IGHD2-2100(48) IGHJ400(224),IGHJ500(224) IGHM00(271) ,524|541|559|0|17||170.0;,518|535|553|0|17||170.0 ,2|15|84|24|36|DA5|48.0 ,34|68|68|46|80|SG37CST53ASC60ASC62A|224.0;,37|71|71|46|80|SG40CST56ASC63ASC65A|224.0 ,0|71|312|80|151|SC14AST22CSC29AST30GSC36ASC51A|271.0 TGTGCGAGAGGCCGCCTCCTAATGCAGTCACCTCTTGACGGTTGG 12 CGCCAGGGAACCCTGGACACCGTATACTCAG 14 CARGRLLMQSPLDGW RQGTLDTVYS_ :::::::::::::::::::::,:::::::::4:2:17:24:26:-41:36:46:-14:49:80:80: GGTCTCAGACTGGGCCCTTATCCCTTGACTCCAACGCCTTTCCACTTGGTTACCATCACTGAGCACAGAGTACTCACCATGGAATTGGGGCTGAGCTGGGTTTTCCTTGTTGCTCTTTTAG,TTACTGTGCGAGAGGCCTCCTCCTAATGCCGCCGCCCTTTGACTGATGGGGACAGGGAACCCTGGTCACCGTCTCATCAGGGAGTGCACCCGCCGCAACCCATTTCCCCCTGGAGTGATATGAGAAATCCCGGGCGCATACGAACAGCGTG EG22B555333B211101A3555335BF355B33310>11B5@4BE3BG314B344@333BBFE31BFF344?BFE3?B333?333BBG///?002?2?AGA/22@2F2FC222@21?<C,//;--99--/B;/-----;/00909;--9-:--.01111G=>1F0HFG?11F1HGC0F//0111FCCC/FB1>B1010/BGDF1/></>/>/>0/FHF1B1/A///?F110B1222B1222AGB00000A00013111>1DB1>111111 IGHV1-800(190),IGHV4-3400(190),IGHV3-5200(161) IGHD2-200(30),IGHD2-800(30),IGHD6-2500(26) IGHJ400(324),IGHJ500(282) IGHM00(159) ,524|543|559|0|19||190.0;,518|537|553|0|19||190.0;,536|555|571|0|19|ST544C|161.0 ,57|63|93|25|31||30.0;,42|48|93|22|28||30.0;,3|11|54|27|35|ST8C|26.0 ,24|68|68|36|80|SA32GSC33ASC39ASC63A|324.0;,37|71|71|46|80|SC42ASC66A|282.0 ,0|71|312|80|151|ST8CSC14GST21ASC31GST33ASC34GSC36GSC37ASG39AST46ASC51GST53GSG56CSG63A|159.0 TGTGCGAGAGGCCTCCTCCTAATGCCGCCGCCCTTTGACTGATGG 12 GGACAGGGAACCCTGGTCACCGTCTCATCAG 14 CARGLLLMPPPFD*W GQGTLVTVSS_ :::::::::::::::::::::,:::::::::4:4:19:25:-26:1:31:36:-4:49:80:80: