huggingface / disaggregators

🤗 Disaggregators: Curated data labelers for in-depth analysis.
Apache License 2.0
66 stars 3 forks source link

Add a basic suite of text disaggregators #5

Closed NimaBoscarino closed 1 year ago

NimaBoscarino commented 1 year ago

Characteristic to consider include:

It's important for these to be implemented following best-practices from existing literature. Note that some of these may not lend themselves well to text data, in which case they might be implemented for image data instead (#6). There may also be some cases where both an image and text version for a particular characteristic will be good to have.

Would be neat to have options for rule-based disaggregators (e.g. word presence for pronouns) and ML-powered ones (e.g. using NLI to find pronouns). Maybe for overlapping ones they could exist as configurations within the same disaggregator module? It's important though that any ML-powered ones use optional extra dependencies.