Closed jnirschl closed 1 year ago
hi @jnirschl ,
Also, what is the significance of the numeric value for the "flag" column, if any?
per the paper,
Additional categories such as not sure or flag denoted uncertainty, image segmentation failures, or other special cases (Supplementary Fig. 3).
wrt your other questions, @ZiqiTang919 could you clarify? (cc @lise-minaud @sghandian @wongdaniel8 )
Hi @jnirschl, basically the number in each row indicates the counting of the corresponding categories in the image. I think understanding the entire process may help clarify the confusion.
During the image preprocessing step, a bounding box was automatically drawn for each candidate plaque. Then an image was generated centered on each candidate for labeling. A label of 0 or 1 was given to each candidate for each category by the neuropathologist. Finally, we incorporated all the labeled images to construct the training dataset. Not that the images for labeling (centered cropped on the bounding box) are different than the images for model training and validation (uniformly segmented from WSIs). The final label for a training image is the aggregation of all the original labels it contains. When a training image contains more than one bounding box, the number for that image can be greater than one. When part of the bounding box is included in a training image, the original label would be multiplied by the percentage of the area of intersection. That's why the number may be a decimal.
great, thanks @ZiqiTang919 . am I correct in remembering that you discretize the labels for model training/etc? e.g., https://github.com/keiserlab/plaquebox-paper/blob/36d8c17e799a3d46259b4dbf01d53fc1756ebf21/2.1)%20CNN%20Models%20-%20Model%20Training%20and%20Development.ipynb?short_path=2dd3c08#L116
great, thanks @ZiqiTang919 . am I correct in remembering that you discretize the labels for model training/etc? e.g.,
Yes, correct.
thanks @ZiqiTang919
@jnirschl closing this, but please let us know if any questions remain
Thanks that makes sense!
I have a question regarding the CSV files in this repository and CSV files in the Tiles.zip file (Zenodo).
What is the significance of the numeric value for each of the classes (cored, diffuse, CAA, negative). They are not in one-hot encoding, and also do not sum to 1 over rows which would suggest a probability distribution. For example: the second row in the screenshot below has CAA=2, Negative = 0.1233, Flag=2, and Not sure=0.1233. It easy to take the argmax and assume that is the ground truth, but I would like to better understand what these numbers mean for each column. Another image has Diffuse=1.9404 and Not sure=1. What do these numbers represent? Also, what is the significance of the numeric value for the "flag" column, if any?
Screenshot from CSV file in Tiles.zip Zenodo