Working off of PR #70 to add Metric Dependencies so that we don't run the same methods more than once.
The core piece that was added are intermediate metrics, which are not represented in tiledb, but are metrics that can be used as sort of "base" methods. Here is an example from p_moments.py, where there is a method dependent on the mean being done, and several others which are dependent on the base moments method being done. This moments intermediate metric then passes all the necessary values are args to its dependent methods.
In order to accomplish this, a dependency graph was needed. Dask's dependencies are generally good enough, but because of how separated these methods can get in the workflow, I found it easier to create Delayed objects with keys that have uuids associated with specific data runs. This way dask can understand that moment_base requires mean, and it can know which of the potentially 1000s of mean methods being run is the correct one.
Working off of PR #70 to add Metric Dependencies so that we don't run the same methods more than once.
The core piece that was added are intermediate metrics, which are not represented in tiledb, but are metrics that can be used as sort of "base" methods. Here is an example from
p_moments.py
, where there is a method dependent on themean
being done, and several others which are dependent on the basemoments
method being done. Thismoments
intermediate metric then passes all the necessary values areargs
to its dependent methods.In order to accomplish this, a dependency graph was needed. Dask's dependencies are generally good enough, but because of how separated these methods can get in the workflow, I found it easier to create
Delayed
objects with keys that haveuuids
associated with specific data runs. This way dask can understand thatmoment_base
requiresmean
, and it can know which of the potentially 1000s ofmean
methods being run is the correct one.