This gives an under-estimate for the diameter, but possibly not a very good one depending on our choice of ref_history.
I propose it may be pretty good if the reference history is a topological outlier, such as what we get from dag.trim_optimal_sum_rf_distance(dag, optimal_func=max).sample()
Over-estimate for diameter:
Since RF distance is a proper distance, the triangle inequality tells us that given any tree a in the DAG, and a tree b that's as far as possible from a in the DAG, the diameter of the set of trees in the DAG is no more than twice the distance from a to b. This is true regardless of the tree a that we start with, but which tree we choose for a will determine how close the over-estimate (twice the distance from a to b) is to the true value of diameter.
It seems that a good choice for a would be the topological median tree, because that should have generally smaller maximum distance to any other tree in the DAG. So, perhaps a good over-estimate of RF distance diameter for a dag would be
It would be interesting to implement these over/under-estimates, and compare them to the true diameters for a bunch of DAGs that are small enough to compute the true diameters of.
True RF distance diameter:
For a DAG which doesn't contain too many histories, it's possible to compute the true RF distance diameter with something like:
However, this is prohibitively slow for large DAGs. It's possible that there's a way to get a good estimate for diameter much more efficiently.
Under-estimate for diameter:
We can always get the maximum RF distance between a reference history and any other history in a DAG with
This gives an under-estimate for the diameter, but possibly not a very good one depending on our choice of
ref_history
. I propose it may be pretty good if the reference history is a topological outlier, such as what we get fromdag.trim_optimal_sum_rf_distance(dag, optimal_func=max).sample()
Over-estimate for diameter:
Since RF distance is a proper distance, the triangle inequality tells us that given any tree
a
in the DAG, and a treeb
that's as far as possible froma
in the DAG, the diameter of the set of trees in the DAG is no more than twice the distance froma
tob
. This is true regardless of the treea
that we start with, but which tree we choose fora
will determine how close the over-estimate (twice the distance froma
tob
) is to the true value of diameter.It seems that a good choice for
a
would be the topological median tree, because that should have generally smaller maximum distance to any other tree in the DAG. So, perhaps a good over-estimate of RF distance diameter for a dag would beEvaluating these estimates:
It would be interesting to implement these over/under-estimates, and compare them to the true diameters for a bunch of DAGs that are small enough to compute the true diameters of.