A concern about the indicators

Vincent-ch99 commented 1 year ago

A very nice job! Thanks for your code!
Actually I have a concern below: Compared with other papers, the evaluation indicators in the article seem to be much lower than 1. Other papers are probably in the range of 0.98-0.99, while the indicators in the paper are all in the range of 0.6-0.8. The paper explains that a new calculation method is adopted for the calculation of evaluation indicators, but the length of the article is limited so it is not very clear. Could you please explain in detail the change of the evaluation index, the motivation for the change, and the advantages it can bring? Looking forward to your reply!

rmaphoh commented 1 year ago

Thank you for your interest. The evaluation strategy we used is standard multi-class segmentation evaluation. We calculate the metrics for each category, e.g. arteries, and merge them by weighted sum (https://github.com/rmaphoh/Learning-AVSegmentation/blob/f7de674efe8e4c4d4ed7291d2d00acd7beec7856/scripts/eval.py#L200C108-L200C108). This evaluation is common and standard in a wide range of medical image computing tasks.

Before using end-to-end deep learning techniques, the multi-class vessel segmentation was split into two steps: vessel segmentation and artery/vein classification. The extremely high performance you mentioned is for artery/vein classification where the vessel segmentation error was not counted. For the deep learning models, it is fairer to use the multi-class segmentation evaluation.

Vincent-ch99 commented 1 year ago

First of all thank you for your patient reply! According to your explanation, it seems from your code that the background and the overlapping parts of arteries and veins are uniformly defined as uncertain category. And i just confuse about whether my understanding is correct or not. Another point I would like to ask is that other papers whose evaluation index value is close to 1 have only calculated the two types of arteries and veins, right? That is, the above uncertain category is ignored. If the above understanding is correct, the introduction of uncertain categories can more effectively evaluate the end-to-end deep learning model and evaluate the segmentation performance of the model more objectively. Actually this is just my tiny guess, may it be correct?

rmaphoh commented 1 year ago

The background and uncertain pixels are organised as two different categories. There are in total four categories: artery, vein, uncertain pixels, and background. The artery, vein, and uncertain pixels are of interest to the segmentation models. So we calculate the metric for each of them followed by a weighted sum.

Yeah, indeed. The colour fundus photograph is 2D imaging without the depth information, thus it is more reasonable to regard the intersection pixels as uncertain pixels. You can also find the uncertain labels in the public datasets, such as DRIVE-AV and HRF-AV (usually in green colour).

In general, the performance evaluation is more standard by 1) calculating the metrics for multi-class segmentation instead of artery/vein classification and 2) introducing uncertain pixels which should not be ignored.

Vincent-ch99 commented 1 year ago

Thanks again for your patient reply, much appreciated! After reading your recent detailed explanation, I checked and compared some relevant literature, and found the following problems:

It seems that most of the tasks based on arterial and venous segmentation are more suitable to be called: arteriovenous classification? Because it seems that most of the papers have taken the evaluation indicators of arteries and veins separately, and then performed a simple average. The important thing is that they ignore the uncertainty part (that is, the overlapping part of the blood vessel + the part of the uncertain category, of course this is my guess). My question is, is this part one that is wrongly ignored by these papers? Because the uncertain part does not belong to arteries and veins, for example, when calculating the arterial evaluation index, it will be regarded as an artery, thereby increasing the value of the evaluation index, resulting in a very close to 1.
The method of considering the uncertain part and weighting the summation can obviously take into account the differences of different categories and more objectively reflect the performance of the segmentation model?
In addition, the 2D image information you mentioned does not contain depth information, but I see that the DRIVE-AV dataset is also a three-channel dataset. Does this not belong to the category of depth information? Thanks again for your patience and reply!

rmaphoh / Learning-AVSegmentation

A concern about the indicators #5