Closed SimonHeybrock closed 4 months ago
Ok for a prototype.
This is not meant to be a prototype.
And I would also prefer not using pandas if possible because users might not use pandas either.
We could use a plain dict
, but then we do not have index names (aka dims), so one would need to implement a custom data structure for this, including ultimately something like a multi-index (or, simpler something N-D like Scipp, but that won't work any more as soon as Cyclebane supports groupby
). I don't think users would prefer using a custom class from an unknown library over a very well known and package in widespread use. Do you have something else in mind?
But in the long run, I would prefer merging this with
compute
.
It is not yet clear to me that this is a good idea, so I refrained from expressing a preference at this point. Pipeline.compute
can compute multiple keys (returning a dict
, instead of a single value). So we would need to return a dict
containing one or more pandas.Series
(instead of a single Series
). Would it be too confusing to get a dict
, a Series
, or a dict
containing Series
, depending on the args? It is starting too look like a bad interface. Maybe the methods for one target (or a single mapped target) should be separate from the one computing multiple targets? Edit: Furthermore, I had to add an optional index_names
argument, in case there are multiple mapped nodes with the same name, which would be cumbersome to integrate into compute
with multiple targets.
This fixes a big UX hurdle when computing results that use mapped nodes.
We will later on consider if this should be integrated more tightly with
Pipeline.compute
, for now this should serve as a solution that is 80% there an can be used to gather more insights/experience.