Extentions for integrations

adrinjalali commented 2 years ago

How should a library integration developer extend the modelcard here to fit their usecase?

It would be nice to start with a documentation to explain how the workflow works for us to set the API. I would suggest something like this file (https://github.com/skops-dev/skops/blob/main/examples/plot_hf_hub.py) instead of a notebook, which can be converted into a notebook.

nateraw commented 2 years ago

How should a library integration developer extend the modelcard here to fit their usecase?

They would write their own template and use the ModelCard.from_template(..., template_path='path/to/their/template'). (CC @merveenoyan if you wanna give this a shot at some point)

Totally agree this workflow should be included in the examples. I would love to have both a script and a notebook, because I want it to be easy for folks to open the example in colab, as its easier to convince them to check out the code.

There's tons of tools to do this, I am not sure which is best/easiest for this use case. Do you have any suggestions @adrinjalali ? (Also CC my friend @borda who I know has done a lot of this).

nateraw commented 2 years ago

Just an update - this workflow is now in both the README and the colab walkthrough 😄

merveenoyan commented 2 years ago

I was a bit busy with the Keras sprint, sorry 😅 I find the UX of passing metadata like this a bit too much (I can't explain what I don't like about it though 😂) link to line in context. It's just an opinion though. This looks good to me. (the template argument)

merveenoyan commented 2 years ago

Nevertheless, I'll implement this. Feel free to assign to me.

nateraw commented 2 years ago

@merveenoyan let's revisit this for Keras in huggingface_hub maybe a little later down the line once we're really happy with the default card. Then you can copy paste that + just update it to have anything Keras specific you might want (i.e. model plot at end)

adrinjalali commented 2 years ago

I'm not familiar with those tools @nateraw . But what I wanted as a resolution to this issue is not specific to any framework, so implementing the keras modelcard would be a separate issue to me.

This is about documenting what a third party developer would need to read in order to write a new template, and fill the template (and figure out if there are changes we need to implement on this library's side)

nateraw commented 2 years ago

@adrinjalali For sure - Did you read the updates in the README/Walkthrough? I think once default model card is in good place, we suggest people copy paste it and update as needed, then pass it to template_path when creating new card.

merveenoyan commented 2 years ago

We should have model examination part (or have a better naming for this part 😅 @mgerchick @EziOzoani) on the model card.

The common things we can extract from models across all frameworks are hyperparameters and nothing more.
If it's a decision tree or neural network based model, we can have plot of the graph.
If it's a neural network model with recorded history, we can have plot of history on model card. (also it would be good to enable users to push tensorboard traces at some point)
If it's a statistical model and if user is willing to provide predictions and test data (or maybe cross validation results?), we have different things to be visualized:
AUC/ROC, confusion matrix for classification, a classification report (which can also be applied for neural network models though)
R squared and MSE for regression
Visual clusters and homogenity score for clustering
Decision boundaries of models
Visualization of pipeline if it's a scikit-learn model
Feature importances 😏 @adrinjalali

I say we should initially support following libraries:

Tensorflow/Keras (which model card is provided at the moment already with Hub mixin)
PyTorch
Transformers (which is supported through Keras and PyTorch and currently exists inside transformers library)
Scikit-learn
fast.ai (which is supported through PyTorch)

I thought of writing a couple of functions that would extract the above information programmatically and add it to the jinja template. I will write the functions and populate them in a notebook that when template is approved, we can add these functionalities.

adrinjalali commented 2 years ago

I would strongly advise against feature importances, they have many biases which are addressed in other methods. We can instead make it easy for users to include PDP and ICE plots: https://scikit-learn.org/stable/modules/partial_dependence.html#partial-dependence and permutation feature importance: https://scikit-learn.org/stable/modules/permutation_importance.html#permutation-importance if it's feasible for the user.

We should also make it easy for users to include metrics sliced by features which are relevant in the dataset, like here: https://fairlearn.org/v0.7.0/user_guide/assessment.html#metrics

merveenoyan commented 2 years ago

@nateraw would you like me to open the PRs to existing ones in HF Hub client library after merger?

nateraw commented 2 years ago

That sounds like a good idea, yea!

nateraw / modelcards

Extentions for integrations #21