Closed AgnesBaud closed 7 months ago
When should nbins
and cut_type
or bins_values
and farleft
arguments be defined ?
compute_binned_data
or binned_data
?Filtering and normalization classes do it at instantiation (but also possible to manually change it later)
The difference is that the farleft
argument is directly linked to the bins_values
argument since it modifies it (if farleft==True, remove -0.001 to first element of the list so that value is included in the first bin). So change it later would not give the same result
_(_compute_best_bins_values is used to make a table taken by chi2contingency)
➔ argument cut_type
(in SeriesBinning
) customizable in this method
➔ More than 1 series, should we look at all series at once?
No problem in looking at all series for equal-width bins
For equal-sized bins, looking at all series as one big series, might results in some cells of the contingency table being < 5 for some series (rows)
For Chi2 contingency, is it important to have the same bins for every comparison? ➔ I don't think so
series1
and series2
(etc.) as arguments, following the model of the ttest_independence
and mann_whitney_u
functionscategorical_series
and numerical_series
(etc.) OR dataframe
, categorical_column
, numerical_column
(etc.) as arguments, which is what pd.crosstab uses to actually create the contingency table + allows the categorical_series to have more than 2 different values
Description
statistical_test.py
anddifferential_analysis.py
SeriesBinning
: bin into equal-size bins (in addition to equal-width)