Closed lvwerra closed 1 year ago
Thanks @lvwerra!!! I love the idea of giving the Disaggregator
a __call__
method, and an attribute to fetch the generated keys. The method name that I have for it right now is pretty awkward 😬https://github.com/huggingface/disaggregators/blob/e1154c880681c26653cc3ae5d9a5a47627440308/src/disaggregators/disaggregator.py#L28
The main reason that I was looking at making changes to the Evaluator was so that the disaggregated metrics calculations could happen after the inferences had run on the entire dataset, since otherwise the evaluator.compute
call will happen in a loop with high complexity (factorial or exponential, not quite sure 😅) since I'm taking products of the power set of every label group to generate the intersections. (From the example notebook)
# Generate all intersectional disaggregation combinations
disaggregation_sets = [v for v in disaggregate_by.values()]
disaggregator_powerset = chain.from_iterable(
combinations(disaggregation_sets, r) for r in range(len(disaggregation_sets)+1)
)
all_combinations = [product(*d) for d in disaggregator_powerset]
So on the evaluate
side I was thinking it could just receive a list of labels or groups of labels to calculate the metrics by, without a dependency on disaggregators
. Do you think something like that could work?
Hi @NimaBoscarino, this looks great! Following up on the internal call for feedback on the API. Here's a proposal to integrate it less into the
Evaluator
itself but enabling to easily use it with "vanilla"evaluate
or other libraries.The proposed
Disaggregator
has a__call__
method that returns a dictionary as well as afields
attribute containing all the generated keys. In terms of lines of code it's almost identical to the proposed solution but allows more flexibility to integrate with other libraries such aspandas
orSpacy
. This way you could also keep thedisaggregators
library free fromdatasets
andevaluate
dependencies which makes it easier to maintain.