Update model-card-annotated.md

meg-huggingface commented 5 months ago

Food for thought: Adding a section that deals directly with Redteaming (as that is becoming a big interest/concern). Called it "Safety Testing", so it's not limited to redteaming and can also include adversarial testing (which some folks in redteaming world make a distinction about). Hit specifically on CSAM, as we have been inspired by Thorn and Rebecca Portnoff to be more proactive about this. For more context, see: https://huggingface.slack.com/archives/C0317KZSB5H/p1712707749145689?thread_ts=1712614250.080999&cid=C0317KZSB5H

HuggingFaceDocBuilderDev commented 5 months ago

It looks like you've updated documentation related to model or dataset cards in this PR.

Some content is duplicated among the following files. Please make sure that everything stays consistent.

modelcard.md
docs/hub/model-cards.md
docs/hub/model-card-annotated.md
src/.../modelcard_template.md (huggingface_hub repo)
src/.../datasetcard_template.md (huggingface_hub repo)
src/.../repocard.py (huggingface_hub repo)
datasetcard.md
docs/hub/datasets-cards.md

HuggingFaceDocBuilderDev commented 5 months ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

meg-huggingface commented 5 months ago

Thanks @julien-c , completely agree it's a bit heavy and also agree we should keep model cards as light and as generally applicable as possible, appreciate that. I have been waffling on best approaches with respect to 2 things happening at the moment:

An increased interest in "safety" and "redteaming" in AI, including in regulation, and yet no clearly corresponding place where one might put results from this in model cards. In "redteaming", topics of concern include sex, violence (and things that might cause violence), racism/sexism etc. So a place to report these types of things would be handy, although I don't necessarily want to be "redteam" specific, since there is the related strategy of using "adversarial testing", as well as approaches that calculate scores over outputs, such as tf-idf and nPMI. So creating a section that all these approaches are all getting at, and recognizing it is part of the evaluation protocol -- calling it something like a "Societal Impact" section -- might be handy to have. But note there is a relationship to "Bias & Risks" that we would need to be clear about. (I detail this more below.)
Thorn's advocacy for having companies start paying some serious attention to the harms that Thorn is focusing on, with a suggestion from them for updating the Model Card protocol in particular. After discussing in the E&S meeting, it's probably good for models not to be too specific about some of these issues, e.g., with searchable tags, in part because this enables malicious actors to find models with potential harm more easily.

To get to your point about applicability to different models: Although the interest in this has grown for generative models, the kinds of issues are applicable to different kinds of models:

Classification models: Creating inappropriate "tags" corresponding to violence or sex. For example, tagging a thermometer gun with a black hand as a firearm: https://algorithmwatch.org/en/google-vision-racism/ . This is the type of thing that would be caught in "redteaming".
Models based on tabular data: Can be things like regression. Forecasting, health, generally filling in a cell for unseen data. As with all models, the potential harm really depends on the data. But relevant problematic outputs could have to do with protecting peoples' PRIVACY (does sensitive private data get leaked from the model?) or for example in the medical domain, drug recommendations that would lead to seizures or be inappropriate for children, etc.

So, in light of all of the above: What about creating a section in the Testing/Evaluation part that said "Societal Impact" or "Safety", was optional, and was free text? Then in the Annotated Model Card, we could explain some of the issues that people might check for (privacy, violence, sexualization), but explicitly not have tags for them to make it less "heavy" and circumvent the foreseeable issue were malicious actors search for models that are harmful specifically? The distinction with "Bias, Risks, and Limitations" is that people tend to put very general things there "may produce incorrect information, may be biased wrt skin tone" etc., whereas the "Societal Impact"/"Safety" section would speak directly to what people are now testing for.

Also ping @luciekaffee in case this convo interests her.

julien-c commented 5 months ago

yep, sounds like a good trade off to me!

pcuenca commented 5 months ago

"Societal Impact" or "Safety"

How about Safety Assessment, so the goal is to encourage the reporting of specific evaluations the model authors may have undertaken? "Societal Impact" might be more accurate, but for a casual reader (me) the distinction with respect to "Bias, Risks, and Limitations" might be a bit blurred. "Safety" may be more generic if we expect authors to also include information about guardrails or similar mechanisms.

meg-huggingface commented 5 months ago

"Societal Impact" or "Safety"

How about Safety Assessment, so the goal is to encourage the reporting of specific evaluations the model authors may have undertaken? "Societal Impact" might be more accurate, but for a casual reader (me) the distinction with respect to "Bias, Risks, and Limitations" might be a bit blurred. "Safety" may be more generic if we expect authors to also include information about guardrails or similar mechanisms.

That makes a lot of sense to me!

huggingface / hub-docs

Update model-card-annotated.md #1267