Feedback on API design - Githubissues

huggingface / disaggregators

🤗 Disaggregators: Curated data labelers for in-depth analysis.

Apache License 2.0

66 stars 5 forks source link

Hi @NimaBoscarino, this looks great! Following up on the internal call for feedback on the API. Here's a proposal to integrate it less into the Evaluator itself but enabling to easily use it with "vanilla" evaluate or other libraries.

dataset = load_dataset("imdb", split="train")
disaggregator = Disaggregator(["pronouns", "random"])

dataset = dataset.map(lambda x: disaggregator(x["text"]))

evaluator = evaluator("text-classification")
results = {}
for dagg_field in disaggregator.fields:
    dataset_filtered = dataset.filter(lambda x: x[dagg_field])

    results[dagg_field] = evaluator.compute(
       model_or_pipeline="distilbert-base-uncased-finetuned-sst-2-english",
       data=dataset_filtered,
       label_mapping={"POSITIVE": 1, "NEGATIVE": 0}
)

The proposed Disaggregator has a __call__ method that returns a dictionary as well as a fields attribute containing all the generated keys. In terms of lines of code it's almost identical to the proposed solution but allows more flexibility to integrate with other libraries such aspandas or Spacy. This way you could also keep the disaggregators library free from datasets and evaluate dependencies which makes it easier to maintain.

# Generate all intersectional disaggregation combinations disaggregation_sets = [v for v in disaggregate_by.values()] disaggregator_powerset = chain.from_iterable( combinations(disaggregation_sets, r) for r in range(len(disaggregation_sets)+1) ) all_combinations = [product(*d) for d in disaggregator_powerset]

huggingface / disaggregators

Feedback on API design #8