Question about inference and auroc measure

laihaoran / CARZero

Apache License 2.0

24 stars 3 forks source link

Question about inference and auroc measure #7

Open reflelia opened 1 month ago

reflelia commented 1 month ago

The process used to generate predicted labels in the inference.py script does not seem to take multi-label classification into account. The ground truth labels contain probabilities for multiple lesions (since this is a multi-label classification task), but the script uses np.argmax to select only a single lesion out of 14 possible lesions (ChestX-ray 14 dataset).

In summary, is the inference.py script wrong for this task? Or is the task itself not intended to be multi-label classification?

laihaoran commented 1 month ago

Thank you for pointing this out. In fact, we did not use 'pre' as hard predictions for calculating metrics; we only used the scores to compute the AUC. Therefore, this part of the code does not require attention. If we wish to calculate metrics in the future, we prefer using prompts like 'There is [Disease]' and 'There is no [Disease]'. After generating the corresponding scores, we can make hard predictions by comparing their numerical values.