[ASK] MIND test dataset doesn't work for run_eval

ubergonmx commented 1 month ago

Description

The following code:

label = [0 for i in impr.split()]

It is essentially making each news ID in the impression list non-clicked.

Instead of modifying the code, I modified the test behaviors file and added -0 to each news ID in the impression list (e.g., N712-0 N231-0). Now I get the following error after running run_eval:

model.run_eval(test_news_file, test_behaviors_file)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File <timed exec>:1

File ~/.conda/envs/recommenders/lib/python3.9/site-packages/recommenders/models/newsrec/models/base_model.py:335, in BaseModel.run_eval(self, news_filename, behaviors_file)
    331 else:
    332     _, group_labels, group_preds = self.run_slow_eval(
    333         news_filename, behaviors_file
    334     )
--> 335 res = cal_metric(group_labels, group_preds, self.hparams.metrics)
    336 return res

File ~/.conda/envs/recommenders/lib/python3.9/site-packages/recommenders/models/deeprec/deeprec_utils.py:594, in cal_metric(labels, preds, metrics)
    591         res["hit@{0}".format(k)] = round(hit_temp, 4)
    592 elif metric == "group_auc":
    593     group_auc = np.mean(
--> 594         [
    595             roc_auc_score(each_labels, each_preds)
    596             for each_labels, each_preds in zip(labels, preds)
    597         ]
    598     )
    599     res["group_auc"] = round(group_auc, 4)
    600 else:

File ~/.conda/envs/recommenders/lib/python3.9/site-packages/recommenders/models/deeprec/deeprec_utils.py:595, in <listcomp>(.0)
    591         res["hit@{0}".format(k)] = round(hit_temp, 4)
    592 elif metric == "group_auc":
    593     group_auc = np.mean(
    594         [
--> 595             roc_auc_score(each_labels, each_preds)
    596             for each_labels, each_preds in zip(labels, preds)
    597         ]
    598     )
    599     res["group_auc"] = round(group_auc, 4)
    600 else:

File ~/.conda/envs/recommenders/lib/python3.9/site-packages/sklearn/metrics/_ranking.py:567, in roc_auc_score(y_true, y_score, average, sample_weight, max_fpr, multi_class, labels)
    565     labels = np.unique(y_true)
    566     y_true = label_binarize(y_true, classes=labels)[:, 0]
--> 567     return _average_binary_score(
    568         partial(_binary_roc_auc_score, max_fpr=max_fpr),
    569         y_true,
    570         y_score,
    571         average,
    572         sample_weight=sample_weight,
    573     )
    574 else:  # multilabel-indicator
    575     return _average_binary_score(
    576         partial(_binary_roc_auc_score, max_fpr=max_fpr),
    577         y_true,
   (...)
    580         sample_weight=sample_weight,
    581     )

File ~/.conda/envs/recommenders/lib/python3.9/site-packages/sklearn/metrics/_base.py:75, in _average_binary_score(binary_metric, y_true, y_score, average, sample_weight)
     72     raise ValueError("{0} format is not supported".format(y_type))
     74 if y_type == "binary":
---> 75     return binary_metric(y_true, y_score, sample_weight=sample_weight)
     77 check_consistent_length(y_true, y_score, sample_weight)
     78 y_true = check_array(y_true)

File ~/.conda/envs/recommenders/lib/python3.9/site-packages/sklearn/metrics/_ranking.py:337, in _binary_roc_auc_score(y_true, y_score, sample_weight, max_fpr)
    335 """Binary roc auc score."""
    336 if len(np.unique(y_true)) != 2:
--> 337     raise ValueError(
    338         "Only one class present in y_true. ROC AUC score "
    339         "is not defined in that case."
    340     )
    342 fpr, tpr, _ = roc_curve(y_true, y_score, sample_weight=sample_weight)
    343 if max_fpr is None or max_fpr == 1:

ValueError: Only one class present in y_true. ROC AUC score is not defined in that case.

Any fix or workaround to this? How do I get the scores?

Other Comments

Originally posted by @ubergonmx in https://github.com/recommenders-team/recommenders/issues/1673#issuecomment-2016744219

I am trying to train the NAML model with the valid + test set.

miguelgfierro commented 1 month ago

it seems that is an error with AUC because there is just one class. It's like all your labels are one class.

I would try to look into the data and make sure you have positive and negagive classes

ubergonmx commented 1 month ago

it seems that is an error with AUC because there is just one class. It's like all your labels are one class.

I would try to look into the data and make sure you have positive and negagive classes

Thank you. I think this was also an issue before, but it was closed as there's no test set with labels.

miguelgfierro commented 1 month ago

Sounds good

recommenders-team / recommenders

[ASK] MIND test dataset doesn't work for run_eval #2105

Description

Other Comments