microsoft / responsible-ai-toolbox-mitigations

Python library for implementing Responsible AI mitigations.
https://responsible-ai-toolbox-mitigations.readthedocs.io/en/latest/
MIT License
57 stars 6 forks source link

Spelling/grammar/formatting issues #22

Closed morrissharp closed 2 years ago

morrissharp commented 2 years ago

I think this is missing a blank line before the bulleted list. The bulleted list is not displaying properly in the docs.

https://github.com/microsoft/responsible-ai-toolbox-mitigations/blob/83c26a3fc61f15c609734126bd4e76e0d922fdbc/raimitigations/databalanceanalysis/aggregate_measures.py#L36

morrissharp commented 2 years ago

In file https://github.com/microsoft/responsible-ai-toolbox-mitigations/blob/83c26a3fc61f15c609734126bd4e76e0d922fdbc/notebooks/databalanceanalysis/data_balance_overall.ipynb?short_path=14098f6#L15-L19

Suggest change to: Data Balance Analysis is relevant for the overall understanding of datasets, but is essential to building Machine Learning models in a responsible way, especially in term of fairness. It is all too easy to build an ML Model that produces biased results for subsets of the population by training or testing the model on biased ground truth data. There are multiple case studies of biased models assisting in granting loans healthcare, recruitment opportunities and many other decision-making tasks. In most of these examples, the data on which these models are trained was the common issue. These findings emphasize how important it is for model creators and auditors to analyze data balance:

In summary, Data Balance Analysis has the following benefits when used for building ML models.

morrissharp commented 2 years ago

The docstring for FeatureBalanceMeasure.measures is not showing up: https://sturdy-barnacle-3b9f911d.pages.github.io/databalanceanalysis/databalanceanalysis.html#databalanceanalysis.feature_measures.FeatureBalanceMeasure.measures

Probably because it is not the first line in the function: https://github.com/microsoft/responsible-ai-toolbox-mitigations/blob/83c26a3fc61f15c609734126bd4e76e0d922fdbc/raimitigations/databalanceanalysis/feature_measures.py#L91-L94

morrissharp commented 2 years ago

The encoding example could use a few more comments:

morrissharp commented 2 years ago

A number of single quotes in this docstring that should instead be a backtick, so that it will be formatted as code, e.g. https://github.com/microsoft/responsible-ai-toolbox-mitigations/blob/0d69bb6db4ddf92db1870a265147e7458be0cf5f/raimitigations/dataprocessing/feat_selection/sequential_select.py#L30-L31

morrissharp commented 2 years ago

From: https://github.com/microsoft/responsible-ai-toolbox-mitigations/blob/0d69bb6db4ddf92db1870a265147e7458be0cf5f/raimitigations/dataprocessing/feat_selection/sequential_select.py#L15-L16

The specific module and library should be highlighted as code:

Implements the sequential feature selection method using the mlxtend library -> Implements the SequentialFeatureSelector method using the mlxtend library

morrissharp commented 2 years ago

A number of single quotes in this docstring that should instead be a backtick, so that it will be formatted as code, e.g.

https://github.com/microsoft/responsible-ai-toolbox-mitigations/blob/0d69bb6db4ddf92db1870a265147e7458be0cf5f/raimitigations/dataprocessing/feat_selection/sequential_select.py#L30-L31

Noting that this issue appears in much (if not all) of the documentation.

morrissharp commented 2 years ago

The math equation here is not being formatted correctly. Maybe there is something missing in the conf file for equations? https://github.com/microsoft/responsible-ai-toolbox-mitigations/blob/0d69bb6db4ddf92db1870a265147e7458be0cf5f/raimitigations/dataprocessing/sampler/rebalance.py#L76

https://sturdy-barnacle-3b9f911d.pages.github.io/dataprocessing/sampler/rebalance.html

morrissharp commented 2 years ago

backticks around cat_col, over_sampler, and transform_pipe, and a few grammatical changes

https://github.com/microsoft/responsible-ai-toolbox-mitigations/blob/0d69bb6db4ddf92db1870a265147e7458be0cf5f/raimitigations/dataprocessing/sampler/rebalance.py#L60-L68

a list of names or indexes of categorical columns. If None, this parameter will be set automatically as a list of all the categorical variables in the dataset. These columns are used to determine the default SMOTE type that should be used: if `cat_col` is None, then use SMOTE; if `cat_col` represents all columns of the dataset, then use SMOTEN; if `cat_col` is a subset of columns of the dataset, then use SMOTENC. If a specific SMOTE object is provided in the constructor (using the `over_sampler` parameter), then the columns in `cat_col` will be automatically encoded using One-Hot encoding (`EncoderOHE`), unless another encoding transformer is provided in the `transform_pipe` parameter;

morrissharp commented 2 years ago

https://github.com/microsoft/responsible-ai-toolbox-mitigations/blob/main/notebooks/dataprocessing/module_tests/feat_sel_catboost.ipynb "Notice that several logging information was generated by CatBoost. We can avoid this by setting the catboost_log parameter to False" -> "Notice that CatBoost logs information to the console during the run. We can suppress this output by setting the catboost_log parameter to False"

There's only one example with Regression: "First of all, let's create a dummy regression dataset so we can build a few examples" -> "First of all, let's create a dummy regression dataset for the next example."

morrissharp commented 2 years ago

It is somewhat confusing that the "new dataset" referred to here is not the same as new_df in the above cell. Maybe it could be referred to as "updated dataset"? https://github.com/microsoft/responsible-ai-toolbox-mitigations/blob/0d69bb6db4ddf92db1870a265147e7458be0cf5f/notebooks/dataprocessing/module_tests/feat_sel_corr_tutorial.ipynb?short_path=38cf628#L550

morrissharp commented 2 years ago

https://github.com/microsoft/responsible-ai-toolbox-mitigations/blob/0d69bb6db4ddf92db1870a265147e7458be0cf5f/notebooks/dataprocessing/module_tests/feat_sel_corr_tutorial.ipynb?short_path=38cf628#L913

Section 4: "Differently to the previous scenarios" -> "In contrast to the previous scenarios"

morrissharp commented 2 years ago

https://github.com/microsoft/responsible-ai-toolbox-mitigations/blob/0d69bb6db4ddf92db1870a265147e7458be0cf5f/notebooks/dataprocessing/module_tests/feat_sel_sequential.ipynb?short_path=6e5cbcf#L637

"5 features" -> "6 features"

morrissharp commented 2 years ago

There should be a link to the docs of skLearn's SimpleImputer here, since the parameter details are denoted in the sklearn docs, and not here.

https://github.com/microsoft/responsible-ai-toolbox-mitigations/blob/0d69bb6db4ddf92db1870a265147e7458be0cf5f/raimitigations/dataprocessing/imputer/basic_imputer.py#L36-L37

morrissharp commented 2 years ago

Not sure if this in on purpose or not, but model_test.ipynb is in the module_tests directory, but listed as a case study in the documentation.

morrissharp commented 2 years ago

https://github.com/microsoft/responsible-ai-toolbox-mitigations/blob/0d69bb6db4ddf92db1870a265147e7458be0cf5f/notebooks/dataprocessing/module_tests/pipeline_test.ipynb?short_path=a2f468b#L766 "although these transformers also accepts" -> although these transformers also accept"

https://github.com/microsoft/responsible-ai-toolbox-mitigations/blob/0d69bb6db4ddf92db1870a265147e7458be0cf5f/notebooks/dataprocessing/module_tests/pipeline_test.ipynb?short_path=a2f468b#L1518 "Therefore, it is interesting that the user continues" -> "Therefore, it is of interest to the user to continue"

https://github.com/microsoft/responsible-ai-toolbox-mitigations/blob/0d69bb6db4ddf92db1870a265147e7458be0cf5f/notebooks/dataprocessing/module_tests/pipeline_test.ipynb?short_path=a2f468b#L1522 "without loosing" -> "without losing"

morrissharp commented 2 years ago

https://github.com/microsoft/responsible-ai-toolbox-mitigations/blob/0d69bb6db4ddf92db1870a265147e7458be0cf5f/notebooks/dataprocessing/module_tests/scaler.ipynb?short_path=fbf13aa#L2382

"mehtod" -> "method"

morrissharp commented 2 years ago

In this class and all of the other scaler classes. https://github.com/microsoft/responsible-ai-toolbox-mitigations/blob/0d69bb6db4ddf92db1870a265147e7458be0cf5f/raimitigations/dataprocessing/scaler/standard.py#L11

"but also makes it more simple to be applied to a dataset" -> "but also makes it simpler to apply to a dataset"

morrissharp commented 2 years ago

https://github.com/microsoft/responsible-ai-toolbox-mitigations/blob/0d69bb6db4ddf92db1870a265147e7458be0cf5f/raimitigations/dataprocessing/toy_dataset_corr.py#L18

"resamble" -> "resemble"

morrissharp commented 2 years ago

https://github.com/microsoft/responsible-ai-toolbox-mitigations/blob/0d69bb6db4ddf92db1870a265147e7458be0cf5f/notebooks/dataprocessing/case_study/case1_stat.ipynb?short_path=e8273c5#L9

"experimento" -> "experiment"