The use of sequence alignment to map topologies to consensus nomenclature is carried out differently, with very similar results.
Only two methods call _mdcu.sequence.align_tops_or_seqs directly:
guess_nomenclature_fragments, to produce exploratory alignments, no kwargs whatsoever
aligntop, to actually align the input top to the consensus nomenclature
The other alignment-needing methods, top2labels and top2frags wrap around aligntop
To re-use alignments, the ConsensusLabeller object now has a new attribute, most_recent_alignment, a pandas.DataFrame that can be passed to top2frags if the expert user trusts the last alignment to be a good source of fragment definitions.
Consequences of this are:
top2frags now returns consensus fragments for which not labels exist, but a subdomain name does: "Cterm" and "Nterm".
Some redundant calls to the _mdcu.sequence.align_tops_or_seqs are eliminated, things are noticeably faster.
The problems wrt to flareplot-fragment-labeling when using consensus get solved, since we're always mapping the whole topology regardless of the interface fragments. This also makes the "orphan_fragment" labeling easier and more flexible
Other modifications in this PR are general style/doc improvements, like add_fragment_labels, speeding up some tests through setUpClass, and avoided unnecessary web_lookups.
The commits b7b24f2, 207b545, 83974f7 are a mess because I only wanted to add a TODO but committed everything and couldn't revert properly, but the tests work
TODO: eliminate the skipped tests and methods not needed any longer
The use of sequence alignment to map topologies to consensus nomenclature is carried out differently, with very similar results.
_mdcu.sequence.align_tops_or_seqs
directly:guess_nomenclature_fragments
, to produce exploratory alignments, nokwargs
whatsoeveraligntop
, to actually align the input top to the consensus nomenclatureThe other alignment-needing methods,
top2labels
andtop2frags
wrap aroundaligntop
To re-use alignments, the ConsensusLabeller object now has a new attribute,
most_recent_alignment
, a pandas.DataFrame that can be passed totop2frags
if the expert user trusts the last alignment to be a good source of fragment definitions.Consequences of this are:
top2frags
now returns consensus fragments for which not labels exist, but a subdomain name does: "Cterm" and "Nterm"._mdcu.sequence.align_tops_or_seqs
are eliminated, things are noticeably faster.Other modifications in this PR are general style/doc improvements, like
add_fragment_labels
, speeding up some tests throughsetUpClass
, and avoided unnecessaryweb_lookups
.The commits b7b24f2, 207b545, 83974f7 are a mess because I only wanted to add a TODO but committed everything and couldn't revert properly, but the tests work
TODO: eliminate the skipped tests and methods not needed any longer