microsoft / responsible-ai-toolbox-mitigations

Python library for implementing Responsible AI mitigations.
https://responsible-ai-toolbox-mitigations.readthedocs.io/en/latest/
MIT License
57 stars 6 forks source link

Seed cannot be set #29

Closed morrissharp closed 2 years ago

morrissharp commented 2 years ago

https://github.com/microsoft/responsible-ai-toolbox-mitigations/blob/0d69bb6db4ddf92db1870a265147e7458be0cf5f/notebooks/dataprocessing/case_study/case2.ipynb?short_path=c0a0c33#L611

The case2.ipynb notebook references the ability to set a seed. But, this is not available for either split_data() , train_model_plot_results() or train_model_fetch_results(). Additionally, I have noticed that that there is no possibility to pass any parameters to the model itself for instantiation/fitting (e.g. setting the number of neighbors for KNN).

I am not sure whether you expect these functions to be used outside of the example notebooks. But if yes, you should consider allowing the user to set a random seed, as well as pass in model parameters, possibly using something like *args **kwargs.

mrfmendonca commented 2 years ago

I'm now using a fixed seed in case1.ipynb, case2.ipynb, and case3.ipynb.

Regarding the possibility of passing parameters to the model, the aim of the train_model_fetch_results() and train_model_plot_results() functions is to simplify the training and testing process. The goal is that we use the same model architecture with the same parameters before and after a pre-processing step, in order to test the efficiency of the pre-processing. Since the objective is not to get the best parameters for a given model, then I just allow the user to specify different model architectures (xgboost, knn, etc.), but all with default parameters.