thuhoainguyen / kits23

The official repository of the 2023 Kidney Tumor Segmentation Challenge (KiTS23)
MIT License
0 stars 0 forks source link

Discuss with Jacopo #16

Open thuhoainguyen opened 3 months ago

thuhoainguyen commented 3 months ago

@anhtuduong Below are the discussion between me and Jacopo recently, keep updated !

Thu: Hi ! This is the fold 0 training of dataset_2 (with histology), but the dice score is very low ~0.35 image I think this is because the imbalance of classes image

I have an idea which is to train only the classes have more than 15 cases, so it will be less bias and the training have higher dice.
The classes after reorganize will be like this:

image What do you think ?

Jacopo: The short version is: go for it, it makes sense to only keep classes with at least 15 cases ` The long version is: it might not make too much sense to look at the final dice score, as it is generally computed (I think in this case as well, if I remember correctly the code I read on Friday) as the average of the dice scores for each class. This means that it is entirely possible that the model learned correctly (or close to) to segment the easier classes and it is doing an awful job on the harder ones (read: the less represented tumor forms), leading to a very low final dice score value. It is entirely possible that the model is already perfectly capable of discriminating between, let's say, clear cell RCC and oncocytomas even with the low dice score we're seeing. I think the relevant step, now, would be to verify whether that's actually the case or not. The most straightforward way of doing that is probably to realize a confusion matrix, i.e. compute the percentage of voxels computed in each of the tumor classes, something like the one below (the numbers are completely random and I only used five labels): ![image](https://github.com/thuhoainguyen/kits23/assets/165920750/c26b33eb-6417-4bb0-85aa-61070c9a57d1) A slightly expanded explanation would be: for each case, count the voxels predicted in each lesion class (I think we can ignore the background, healthy tissue and cysts, here) and divide them by their total number. Then average over the TRUE labels (i.e. average together all cases with the same histology class). You should end with a table such as the one above, so we can learn if there's specific types of lesions the model is getting confused over, if it's just having an issue with the rarest ones or if it is not working at all. Let me know if something is unclear, I'll try to get back to you as soon as possible`

thuhoainguyen commented 3 months ago

@anhtuduong Update about the discussion with J: Thu: I've trained the dataset_3 (preprocessed with SELECTED histology classes clear_cell_rcc, chromophobe, oncocytoma, papillary, the rests are grouped into other). Fold 0 3d_lowres. Here's the results on the validation set: https://drive.google.com/file/d/1r0DUAsFcmBjNz3JHiD89H-biCe2Ie-th/view?usp=sharing We can clearly see that the DICE for kidney, cyst, tumor is as high as we train the original dataset. Which means that the model learned perfectly the first 3 classes. The result file even shows the predictions on every cases in validation set. The predictions made on the histology classes are not very well. I wrote a script that extract the info and generate a confusion matrix table:

Image

Image

Jacopo Hi Thu! A few comments: good that the dice scores for the other classes are as before; I'm not sure what I'm seeing in those confusion matrices: I'm guessing you selected the most commonly predicted label and used that as a way to label the entire case, in the testing dataset alone, is that correct? if that's the case, indeed the resulting classification looks rather bad, which is rather puzzling considering that the "other" label is not even the most common; my guess is that the "other" label has the highest variance, so it tends to become the default class for any lesion that the model doesn't learn to predict with high confidence; I can think of two quick fixes to evaluate if the resulting model is still somewhat useful: re-compute the confusion matrices ignoring completely the "other" class (i.e. restrict the matrix to the 4 named labels and if, for a case, the most predicted class is "other" select the second most predicted one instead); the second fix would be that of computing the AUC (Area under the curve) for all class pairs. This should be done, for each case, on the sum of the raw outputs of the networks (i.e. the logits), not on the count of voxels, if possible. I'm rather sure there's already Python packages to do that: sorry I can't explain this further myself as I'm somewhat short on time, now; following from the point above, I'm afraid I can't check your draft, for the moment. I'm undergoing surgery tomorrow, so I won't be able to do that until Friday (unlikely) or Saturday (much more probable). I should have enough time over the weekend to go through the whole thing. Sorry about that. If you have any other questions or doubts, please do ask, I'll try to answer to the best of my abilities as soon as I can (again, definitely not before Friday).

thuhoainguyen commented 3 months ago

@anhtuduong Jacopo just checked my thesis report and gave me some comments below:

I read the whole thesis, here's some comments:

Beside the points above (which I would try to address, if you manage to), in general, the thesis is well structured and well written, congrats!

If you want to save a few pages, as a very first thing I'd reformat a couple of lists which are currently taking a lot of space: one in intensity normalization and one in Extension of the dataset with Histology-Specific data. I think you can use a horizontal table for the first one and entirely remove the second, as it is difficult to understand and doesn't add very much information.

The only other section that might need to be revised (if you find the time) is that on related works: while the chosen ones are relevant, it is unclear why you chose specifically those ones, as they lead to mid-positions in the KiTS21 challenge. Is there a reason to mention those and not the winners for instance?