Closed willdumm closed 2 years ago
The system for converting between HistoryDag subclasses is described in the HistoryDag
and HistoryDag.from_history_dag
docstrings. Provided sufficient label data is present, any HistoryDag subclass instance olddag
can be converted to any new HistoryDag subclass type NewHDagType
by doing NewHDagType.from_history_dag(olddag)
.
New HistoryDag subclasses need not override the static method from_history_dag
, as long as they describe their required label fields and conversion functions in the static variable _required_label_fields
. This variable contains a dictionary keyed by required label fields, each having as a value a list of tuples (from_labels, conversion_func)
, where conversion_func
is a function taking a node and arbitrary keyword arguments, and returning the value of the required label field for that node, and from_labels
is a tuple containing the names of label fields expected by conversion_func
.
Conversion functions are required to accept keyword arguments, so that label data conversions which require additional data may be allowed. These keyword arguments may be provided to the from_history_dag
method, and will be passed through to conversion_func
s.
One unresolved issue is that it is difficult to document the keyword arguments required for conversion to HistoryDag subclasses. It would be inelegant to rewrite the entire from_history_dag
docstring even though the method itself needs no overriding.
This PR introduces a system for subclassing
HistoryDag
according to expected node label data, and a standard way to convert between these subclasses. In particular, we add the following new subclasses:CGHistoryDag
requires label data to include acompact_genome
field, which contains instances of the new classCompactGenome
. The new module containing this subclass,mutation_annotated_dag
, includes interfaces to load and write to MAD protobuf, and to JSON. ACGHistoryDag
can be created from a DAG that has node labels including asequence
attribute.SequenceHistoryDag
requires label data to include asequence
field.