huggingface / disaggregators

🤗 Disaggregators: Curated data labelers for in-depth analysis.
Apache License 2.0
66 stars 5 forks source link

How to analyze the disaggregated dataset? #29

Closed chengyineng38 closed 1 year ago

chengyineng38 commented 1 year ago

I ran the example code below. The readme mentions The resulting dataset can now be used for data exploration and disaggregated model evaluation. How do I explore or evaluate the disaggregated dataset? Can you provide more documentation and sample code?

from disaggregators import Disaggregator
from datasets import load_dataset

dataset = load_dataset("imdb", split="train")
disaggregator = Disaggregator("pronoun", column="text")

ds = dataset.map(disaggregator)  # New boolean columns are added for she/her, he/him, and they/them

I see that the output of ds is

Screen Shot 2023-04-10 at 10 36 55 AM
NimaBoscarino commented 1 year ago

Hey there @chengyineng38! It's very much up to you how you choose to explore and even visualize the disaggregated dataset, since it depends on your use case. If you're familiar with pandas, the easiest way to get a data format that you can work with is by creating a data frame with df = ds.to_pandas().

I have some examples of donut chart visualizations in this demo: https://huggingface.co/spaces/society-ethics/disaggregators, and you can see the code here: https://huggingface.co/spaces/society-ethics/disaggregators/tree/main

chengyineng38 commented 1 year ago

Got it, that's helpful. I did see https://huggingface.co/spaces/society-ethics/disaggregators but didn't see your code because it's not linked on the viz page.