nateraw / modelcards

📝 Utility to create, edit, and publish model cards on the Hugging Face Hub. [**Now lives in huggingface_hub**]
MIT License
15 stars 4 forks source link

Add support for Carbon Emissions reporting in CardData #14

Open nateraw opened 2 years ago

nateraw commented 2 years ago

Its possible to include emissions data in your model card metadata. We should make it easy to do so with this package.

Spec:

co2_eq_emissions:
      emissions: "in grams of CO2"
      source: "source of the information, either directly from AutoTrain, code carbon or from a scientific article documenting the model"
      training_type: "pretraining or fine-tuning"
      geographical_location: "as granular as possible, for instance Quebec, Canada or Brooklyn, NY, USA"
      hardware_used: "how much compute and what kind, e.g. 8 v100 GPUs"

This spec also apparently lets you just say co2_eq_emissions: <amount> where the amount is what you would have passed to emissions within, so something like 1000.0. This is what I've seen used pretty often.

In the end, if someone didn't report emissions data but they want to, it'll be as easy as this as soon as we add the feature.

from modelcards import ModelCard

card = ModelCard.load('nateraw/rare-puppers')
card.data.emissions = 1000.0
card.push_to_hub('nateraw/rare-puppers')

Or, if they want to include the extra metadata from the spec, maybe we have a dataclass called something like EmissionsData you can use.

from modelcards import ModelCard

card = ModelCard.load('nateraw/rare-puppers')
card.data.emissions = EmissionsData(
    emissions=1000.0,
    source="codecarbon",
    training_type="fine-tuning",
    geographical_location="Rochester, NY",
    hardware_used="1 Nvidia GTX 1080"
)
card.push_to_hub('nateraw/rare-puppers')

CC @sashavor

nateraw commented 2 years ago

This technically works now as CardData accepts any arbitrary data. Would still be nice to define it in the CardData object perhaps