Closed Vincent-ch99 closed 11 months ago
Thank you for your interest. The evaluation strategy we used is standard multi-class segmentation evaluation. We calculate the metrics for each category, e.g. arteries, and merge them by weighted sum (https://github.com/rmaphoh/Learning-AVSegmentation/blob/f7de674efe8e4c4d4ed7291d2d00acd7beec7856/scripts/eval.py#L200C108-L200C108). This evaluation is common and standard in a wide range of medical image computing tasks.
Before using end-to-end deep learning techniques, the multi-class vessel segmentation was split into two steps: vessel segmentation and artery/vein classification. The extremely high performance you mentioned is for artery/vein classification where the vessel segmentation error was not counted. For the deep learning models, it is fairer to use the multi-class segmentation evaluation.
First of all thank you for your patient reply! According to your explanation, it seems from your code that the background and the overlapping parts of arteries and veins are uniformly defined as uncertain category. And i just confuse about whether my understanding is correct or not. Another point I would like to ask is that other papers whose evaluation index value is close to 1 have only calculated the two types of arteries and veins, right? That is, the above uncertain category is ignored. If the above understanding is correct, the introduction of uncertain categories can more effectively evaluate the end-to-end deep learning model and evaluate the segmentation performance of the model more objectively. Actually this is just my tiny guess, may it be correct?
The background and uncertain pixels are organised as two different categories. There are in total four categories: artery, vein, uncertain pixels, and background. The artery, vein, and uncertain pixels are of interest to the segmentation models. So we calculate the metric for each of them followed by a weighted sum.
Yeah, indeed. The colour fundus photograph is 2D imaging without the depth information, thus it is more reasonable to regard the intersection pixels as uncertain pixels. You can also find the uncertain labels in the public datasets, such as DRIVE-AV and HRF-AV (usually in green colour).
In general, the performance evaluation is more standard by 1) calculating the metrics for multi-class segmentation instead of artery/vein classification and 2) introducing uncertain pixels which should not be ignored.
Thanks again for your patient reply, much appreciated! After reading your recent detailed explanation, I checked and compared some relevant literature, and found the following problems:
A very nice job! Thanks for your code!
Actually I have a concern below: Compared with other papers, the evaluation indicators in the article seem to be much lower than 1. Other papers are probably in the range of 0.98-0.99, while the indicators in the paper are all in the range of 0.6-0.8. The paper explains that a new calculation method is adopted for the calculation of evaluation indicators, but the length of the article is limited so it is not very clear. Could you please explain in detail the change of the evaluation index, the motivation for the change, and the advantages it can bring? Looking forward to your reply!