ncoudray / DeepPATH

Classification of Lung cancer slide images using deep-learning
489 stars 210 forks source link

train/test/folder image structure for fine-tuning via Inception V3 for mutation classification #100

Closed monajalal closed 2 years ago

monajalal commented 2 years ago

Hi Nicolas,

Thanks a lot for your great scientific contributions. I had a couple of questions regarding mutation classification.

I understand that you have a label file that resembles like this: Screenshot from 2022-03-03 17-37-21

However, if I want to do transfer learning by finetuning the Inception V3, (assuming we want to classify for presence of absence of TP53), we need to have folder such as below:

-- train ----TP53 ----not_TP53

-- val ----TP53 ----not_TP53

-- test ----TP53 ----not_TP53

If we were doing WSI-level classification, this would have been an easy task. However, since we are doing patch-based inputs to Inception V3, I am baffled as how do you systematically keep track of these patches in a WSI level, when pouring them into these folders (I understand you do the 70/15/15 division on WSI level -- but how do you exactly pour the patch images in the folder?) as well as how do you keep track of patches decision for global average pooling or any sort of majority voting when it comes to predictions on patch-level?

Also, am I understanding correctly that you have weak label for the patches, meaning you assign the label of the WSI to the entire patches inside that WSI?

ncoudray commented 2 years ago

Hi Mona - The division of patches into the train, test and valid is called "sorting" in our pipeline and done in step 02a. It works differently from what you describe in that it creates 1 folder per label instead of 1 folder per set. The set is append to the tile name. So we'll have two folders with image name defining the set: --TP53 train_image1.jpg train_image2.jpg .... test_imagexxx.jpg ... valid_imageyyy.jpg --no_TP53 train_image1.jpg train_image2.jpg .... test_imagexxx.jpg ... valid_imageyyy.jpg

then we parse the names when converting to TFRecord, and during the conversion, the train, test and valid are done separately and saved in different subfolders.

yes, we have weak labels for the patches with that approach.

Best