rasbt / mlxtend

A library of extension and helper modules for Python's data analysis and machine learning libraries.
https://rasbt.github.io/mlxtend/
Other
4.82k stars 853 forks source link

combined_ftest_5x2cv: accuracy vs error rates #1086

Open AlbertoImg opened 3 months ago

AlbertoImg commented 3 months ago

Hi @rasbt, First of all thanks for the implementations and the teaching material. It is really appreciated. I want to use this test is r, so I was looking at your python implementation to get an idea. I would like to ask you a technical detail that is not clear to me: I saw that in your function "combined_ftest_5x2cv", in a classification case, you use the "accuracy" as the default scoring. When looking at the papers Dietterich 1998 and Alpaydin 1998, they mentioned this scoring ($p_i^{(j)}$) as "error rates" or "observed proportion of test examples misclassified by algorithm". Are you using the "accuracy" scoring because the math at the end does not change (considering accuracy = 1-error rate, and the difference between the algorithms cancel this "1-" operation) ?

Thanks in advance Best Alberto

Your implementation: `` if scoring is None: if estimator1._estimator_type == "classifier": scoring = "accuracy" <-- HERE elif estimator1._estimator_type == "regressor": scoring = "r2" else: raise AttributeError("Estimator must " "be a Classifier or Regressor.") if isinstance(scoring, str): scorer = get_scorer(scoring) else: scorer = scoring

variances = [] differences = []

def score_diff(X_1, X_2, y_1, y_2): estimator1.fit(X_1, y_1) estimator2.fit(X_1, y_1) est1_score = scorer(estimator1, X_2, y_2) <-- HERE est2_score = scorer(estimator2, X_2, y_2) <-- HERE score_diff = est1_score - est2_score <-- HERE return score_diff ´´