Currently there are two very similar classes in Isovar and Vaxrank: isovar.ProteinSequence and vaxrank.MutantProteinFragment. While I'm currently refactoring Vaxrank to work with the new Isovar API, it would actually simplify that code to simply use isovar.ProteinSequence instead (and get rid of MutantProteinFragment.
This requires moving some additional functionality into Isovar: (1) distinguishing mutated from non-mutated sequences and (2) slicing through a sequence to get a subsequences.
Major changes:
Gave ProteinSequence a more straight-forward __init__ method and moved the logic for building a ProteinSequence from a list of Translation objects into the class method from_translations.
Added contains_mutation property to TranslationKey (which is inherited by Translation and ProteinSequence).
Minor changes:
Renamed Translation.variant_aa_interval_start to Translation.mutation_start_idx
Renamed Translation.variant_aa_interval_end to Translation.mutation_start_idx
Renamed ProteinSequence.mutation_start to ProteinSequence.mutation_start_idx
Renamed ProteinSequence.mutation_end to ProteinSequence.mutation_end_idx
Added contains_deletion property to ProteinSequence, to help distinguish deleted amino acids from other kinds of mutations
Added common.normalize_base0_range_indices to convert negative and None indices into a non-negative start/end pair, and also check that they're contiguous (stride == 1).
Along with the additional functionality on ProteinSequence, this PR also fixes many failing tests due to more stringent contig name checking recently added to Varcode.
Coverage increased (+0.2%) to 92.96% when pulling 4ae511624e6a64cf2ba4164f51fbedc261afafd2 on add-slice-method-to-protein-sequence into 6456397fca2e1e2ef0b604202f4bba4de8c7a92a on master.
Currently there are two very similar classes in Isovar and Vaxrank:
isovar.ProteinSequence
andvaxrank.MutantProteinFragment
. While I'm currently refactoring Vaxrank to work with the new Isovar API, it would actually simplify that code to simply useisovar.ProteinSequence
instead (and get rid ofMutantProteinFragment
.This requires moving some additional functionality into Isovar: (1) distinguishing mutated from non-mutated sequences and (2) slicing through a sequence to get a subsequences.
Major changes:
ProteinSequence
a more straight-forward__init__
method and moved the logic for building a ProteinSequence from a list of Translation objects into the class methodfrom_translations
.contains_mutation
property toTranslationKey
(which is inherited byTranslation
andProteinSequence
).Minor changes:
Translation.variant_aa_interval_start
toTranslation.mutation_start_idx
Translation.variant_aa_interval_end
toTranslation.mutation_start_idx
ProteinSequence.mutation_start
toProteinSequence.mutation_start_idx
ProteinSequence.mutation_end
toProteinSequence.mutation_end_idx
contains_deletion
property toProteinSequence
, to help distinguish deleted amino acids from other kinds of mutationscommon.normalize_base0_range_indices
to convert negative and None indices into a non-negative start/end pair, and also check that they're contiguous (stride == 1).Along with the additional functionality on
ProteinSequence
, this PR also fixes many failing tests due to more stringent contig name checking recently added to Varcode.