Unfair evaluation settings on IMDB leads to unreasonable results (74+)

First of all, thank you for your work.

I found that the evaluation for the multi-label dataset IMDB is unreasonable, which leads your method to an incredibly high F1 score (74+). To be specific, you cannot give the binary_pred with prior knowledge of how many classes for each node when evaluating. It's unfair.

    for i in range(preds.shape[0]):
        k = labels[i].sum().astype('int')
        topk_idx = preds[i].argsort()[-k:]
        binary_pred[i][topk_idx] = 1
        for pos in list(labels[i].nonzero()[0]):
            if labels[i][pos] and labels[i][pos] == binary_pred[i][pos]:
                num_correct += 1

In fact, it is usually to use metrics.f1_score(labels, preds>0) for evaluation.

Don't you think it is unfair for the other published and existing papers? Everyone is racing to compete, you cannot take a rocket to increase your scores by changing evaluation settings for your method.

ivam-he / PSHGCN

Unfair evaluation settings on IMDB leads to unreasonable results (74+) #1