huggingface / disaggregators

🤗 Disaggregators: Curated data labelers for in-depth analysis.
Apache License 2.0
66 stars 5 forks source link

Feedback on API design #8

Closed lvwerra closed 1 year ago

lvwerra commented 1 year ago

Hi @NimaBoscarino, this looks great! Following up on the internal call for feedback on the API. Here's a proposal to integrate it less into the Evaluator itself but enabling to easily use it with "vanilla" evaluate or other libraries.

dataset = load_dataset("imdb", split="train")
disaggregator = Disaggregator(["pronouns", "random"])

dataset = dataset.map(lambda x: disaggregator(x["text"]))

evaluator = evaluator("text-classification")
results = {}
for dagg_field in disaggregator.fields:
    dataset_filtered = dataset.filter(lambda x: x[dagg_field])

    results[dagg_field] = evaluator.compute(
       model_or_pipeline="distilbert-base-uncased-finetuned-sst-2-english",
       data=dataset_filtered,
       label_mapping={"POSITIVE": 1, "NEGATIVE": 0}
) 

The proposed Disaggregator has a __call__ method that returns a dictionary as well as a fields attribute containing all the generated keys. In terms of lines of code it's almost identical to the proposed solution but allows more flexibility to integrate with other libraries such aspandas or Spacy. This way you could also keep the disaggregators library free from datasets and evaluate dependencies which makes it easier to maintain.

NimaBoscarino commented 1 year ago

Thanks @lvwerra!!! I love the idea of giving the Disaggregator a __call__ method, and an attribute to fetch the generated keys. The method name that I have for it right now is pretty awkward 😬https://github.com/huggingface/disaggregators/blob/e1154c880681c26653cc3ae5d9a5a47627440308/src/disaggregators/disaggregator.py#L28

The main reason that I was looking at making changes to the Evaluator was so that the disaggregated metrics calculations could happen after the inferences had run on the entire dataset, since otherwise the evaluator.compute call will happen in a loop with high complexity (factorial or exponential, not quite sure 😅) since I'm taking products of the power set of every label group to generate the intersections. (From the example notebook)

# Generate all intersectional disaggregation combinations
disaggregation_sets = [v for v in disaggregate_by.values()]
disaggregator_powerset = chain.from_iterable(
    combinations(disaggregation_sets, r) for r in range(len(disaggregation_sets)+1)
)

all_combinations = [product(*d) for d in disaggregator_powerset]

So on the evaluate side I was thinking it could just receive a list of labels or groups of labels to calculate the metrics by, without a dependency on disaggregators. Do you think something like that could work?