milaboratory / mixcr

MiXCR is an ultimate software platform for analysis of Next-Generation Sequencing (NGS) data for immune profiling.
https://mixcr.com
Other
323 stars 78 forks source link

-nFeature, -nMutations on exportClones #188

Closed iansetliff closed 7 years ago

iansetliff commented 7 years ago

-nFeature, -nMutations exporting blanks or '-' during exportClones.

Unable to export nt sequence for any gene feature other than CDR3. Alignment was performed with new release. vIdentityPercent still calculates though.

Happy to provide more details if needed.

dbolotin commented 7 years ago

Please post several typical alignments from vdjca file. Try this:

mixcr exportAlignmentsPretty -n 10 alignments_file.vdjca

Had this problem appeared after latest release (2.0.3)?

iansetliff commented 7 years ago

Alignments from some public data:

alignmentsForGithub.txt

^^ produced via: mixcr exportAlignmentsPretty -n 10 SRR654171Alignments.vdjca > alignmentsForGithub.txt

But upon running, for example: mixcr exportClones -cloneId -nFeature FR3 -nMutations FR3 -aaFeature CDR3 -vBestIdentityPercent SRRClones.clns SRR654171exportClones.txt

I get: SRR654171exportClones.txt

I have had this problem with previous releases, but tried again on 2.0.3 hoping it would work, but no success. Am I doing something wrong?

Also, how is vBestIdentityPercent calculated when there's not, for example, a full FR1? Is there a way to export from the beginning of the available FR1 (even if the read doesn't span all of FR1) through, say, the end of FR3, for example?

Thanks!

dbolotin commented 7 years ago

Thanks for providing example files!

Clonotype objects (stored in *.clns file) have information only about their clonal sequence (CDR3 by default). And all alignments, partitioning into gene features, etc.. are defined only inside this region. So it is impossible to extract any sequences or mutations for regions outside provided clonalSequence.

Having your full-length data, you can assemble clonotypes using wider clonal sequence (docs), e.g.:

mixcr assemble -OassemblingFeatures='VDJRegion' alignments.vdjca output.clns

or

mixcr assemble -OassemblingFeatures='{CDR1Begin:CDR3End}' alignments.vdjca output.clns

see here for detailed description of gene feature syntax.

After this you will be able to extract any sub-gene-feature of provided clonal sequence (e.g. FR3, etc..).

vBestIdentityPercent is calculated as a fraction of matching letters divided by length of alignment with best matching V gene. If this value is extracted for alignments (from *.vdjca file) the value will represent "identity" of full alignment build against sequencing read, if value is extracted for clonotypes (from *.clns file) value will represent "identity" of alignment only inside clonal sequence (e.g. if default parameters are used, this value will be calculated using small part of V gene contained in CDR3).

Please also see example pipeline for full-length antibody profiling: link.

Concerning your last question: unfortunately it is currently not possible to extract sequences starting from the first available nucleotide (though, there definitely should be such possibility; just created an issue #190 for this). As a workaround you can extract sequences for trimmed gene features, like:

mixcr exportClones -nFeature 'FR3(12,0)' ...

equivalent to

mixcr exportClones -nFeature '{FR3Begin(12):FR3End}' ...

which will create a column for FR3 without 12 nucleotides on the 5' side, and if read covers this trimmed region but not covers full FR3 this region will still be exported.

iansetliff commented 7 years ago

Thanks so much for the helpful input. I'm a big fan of your software, and I certainly thank you and the rest of the lab for making such a useful, user-friendly, and efficient suite of tools. Being able to export whatever the first nucleotide available is in the V region would certainly be awesome, as would being able to fetch identity% for specific gene features of interest (say, for example, first available nt in V through FR3End). Thanks for making the issue and providing a work-around in the meantime, and I look forward to future releases.

dbolotin commented 7 years ago

Thank you for your words, and your feedback about features that you need!