steps to run the experiment from A to Z

codeskings commented 3 years ago

Thanks for sharing this complete detailed project with us. However, there are many scripts in it that I got confused with the order of their execution, can you please confirm my order below: 1) convert the ground truth images to pyramidal format. 2) run (cross-validation-splitter) twice to create the different folds; once for the viable tumor and once for the whole tumor. 3) run (point_extractor) four times; train/whole, train/viable, valid/whole, valid/viable 4) run (patch_coords_cv_splitter) twice for the whole and viable tumors. 5) run (trainer) for the 3 deep models independently for each fold.

Questions -

after that step, I got lost, what should be the next script to run? (predict) from the (inference)? but it require a (infer_*) directory that I couldn't find where to create it?
also, I have a question regarding the 2000 normal patch that we extract from the image; they are extracted from level 0 only?
what is the difference between img_sampling_level and mask_sampling_level ?
why they aren't zeros as in the image level?
Also, in the comments, you stated that (Img downsamples are pows of 4, mask downsamples are pows of 2), I read your paper fully but I couldn't find an explanation for this?

Sorry for the many questions but I want to fully understand each step. Thank you very much

haranrk commented 3 years ago

@codeskings that execution order is indeed correct.

after that step, I got lost, what should be the next script to run? (predict) from the (inference)? but it require a (infer_*) directory that I couldn't find where to create it?
- the predict.py under inference. That requirement was a relic from an experiment we had tabled. I have commented out those lines and pushed. Kindly pull the latest commit.
also, I have a question regarding the 2000 normal patch that we extract from the image; they are extracted from level 0 only?
- Yes the model is trained with patches from level 0
Also, in the comments, you stated that (Img downsamples are pows of 4, mask downsamples are pows of 2), I read your paper fully but I couldn't find an explanation for this?
- So, the pyramidal images contain copies of the image at different resolutions (levels). The different resolutions are downsamples of the original. The resolutions of the input image (tissue sample) are downsamples by powers of 4. For example, if the original image is (10,000 x 10,000), the other resolutions would be (2,500 x 2,500), (625 x 625) and so on. The mask image has downsamples in powers of 2, i.e (10,000 x 10,000), (5,000 x 5,000), (2,500 x 2,500) etc.
what is the difference between img_sampling_level and mask_sampling_level ?
- Since the img and mask has downsamples by either 2 or 4, two different sampling levels exist as we need to match the resolutions. Level 1 (2,500 x 2,500) of the tissue sample image would correspond to level 2 of the mask image.
why they aren't zeros as in the image level?
- I don't understand, could you rephrase?

Apologies for not having clear documentation. We didn't have time to clean up the code properly and write documentation. Therefore, do not hesitate to ask us your questions and we'll get back to you as soon as possible. Feel free to make pull requests as well.

codeskings commented 3 years ago

wow, thank you for this clear response, I really appreciate it. I will put it in my consideration while reinvestigate the code scripts again. I may bother you if I have other questions. Thanks again

codeskings commented 3 years ago

Sorry for bothering you again but I got a couple of new questions if you can kindly help in them because it seems that I am still have an issues in understanding the topic well.

The number of levels in the pyramidal format of the tissue image(.svs) and the corresponding mask image are different, therefore, we have two sampling rates; one for the image and one for the mask. Is this correct?
Aren’t the training patches extracted from level 0, then why we care about the other higher levels in both the tissue and the mask images?
What is the purpose of shifting the extracted normal patch? Aren’t we risking to convert it to tumor patch? Why the shifting is not done when extracting the tumor patch as well?
The models in the training phase are trained independently and not combined in any way. Correct? The ensemble model is only utilized at the inference stage. Correct?
Where do the hard mining for the difficult examples is used in the code?
When I run (predict.py) from the inference, I succeed to extract the output images when I used the (models to save) to be one of the three models used in the ensemble. However, when I tried to put (models to save) as the ensemble model, the following error occurred: pred_map_dict[key] = model_dict[key].predict(image_patches, verbose=0, batch_size=2) AttributeError: 'str' object has no attribute 'predict'

can you please advise me what I made wrong? Because I searched a lot with no luck.

I apologize for wasting your time, thank you very much.

haranrk commented 3 years ago

Please do not hesitate to ask any questions. It's definitely not a waste of our time.

The number of levels in the pyramidal format of the tissue image(.svs) and the corresponding mask image are different, therefore, we have two sampling rates; one for the image and one for the mask. Is this correct?

Correct

Aren’t the training patches extracted from level 0, then why we care about the other higher levels in both the tissue and the mask images?

We use the higher levels to create a tissue mask. This mask is created to ignore the pixels from the glass slide. This saves time and increases accuracy.

What is the purpose of shifting the extracted normal patch? Aren’t we risking to convert it to tumor patch? Why the shifting is not done when extracting the tumor patch as well?

What shifting are you referring to? There is no shifting done

The models in the training phase are trained independently and not combined in any way. Correct? The ensemble model is only utilized at the inference stage. Correct?

Correct

Where do the hard mining for the difficult examples is used in the code?

Hardmining was not used in our final models. We tried hardmining but it didn't work for us.

When I run (predict.py) from the inference, I succeed to extract the output images when I used the (models to save) to be one of the three models used in the ensemble. However, when I tried to put (models to save) as the ensemble model, the following error occurred: pred_map_dict[key] = model_dict[key].predict(image_patches, verbose=0, batch_size=2) AttributeError: 'str' object has no attribute 'predict'

I'm not sure what went wrong exactly. But I think you added the ensemble key to the model_keys dict. The ensemble_key should be added only to the var models_to_save.

codeskings commented 3 years ago

Sir, first of all thank you very much for responding to me and for your patience, I know you are busy, so please feel free for not answering my questions if they require much of your time since they are a little detailed this time.

The shifting is done in the “points_extractor.py” in “extract_normal_patches_from_wsi” function, as follows:

shifted_point = (int(scoord[0]-patch_size//2), int(scoord[1]-patch_size//2))
mask_patch = np.array(mask_obj.read_region(shifted_point, patch_level, (patch_size, patch_size)).convert('L'))

I was studying the inference phase; therefore, I have some questions about them if possible:

the inference is used to predict the training images in the (predict.py). I think this is done for illustration purpose since they are already (or part of them) is exposed to the training model. To test the model, I should run the (predict.py) with the validation data. Correct?
Which is better to calculate jaccard index: at the final stage on the whole generated prediction of all the tissue image, or at each patch with the corresponding prediction then sum them up?
Count-map variable in the inference, what is its role exactly? As far as I understood that it is an indication of how accurate our approximate segmentation (that are done at a lower resolution) is? Since it counts how many zeros (black pixels) in the prediction. Correct? Why we are dividing the prediction by it?
Roi_masking is used to eliminate or keep the white spaces in the mask. If the ROI-masking is true then the strided mask will more likely to be same as the mask itself, while if it false then it replace the mask value at each stride with one, which cause white spaces to appear. In this case I think keeping it true is better. Correct?
What is (ov-im-stride) and (ov_im) variables refer to? What are their significance? why we added masked image and the scaled tissue image together? Why dividing by 2?
```
     ov_im = mask_im / 2 + im_im / 2
    ov_im_stride = st_im / 2 + im_im / 2
```


-   After the prediction, what does this statement do? Why 128?
`                wsi_img = image_patches[j] * 128 + 128
`
-   I understood that the **scaling prediction** is done after projecting back to the lower resolution, but I didn’t understand what is the **Thresholding prediction**.

Thank you again, I appreciate your effort and sharing such a quality project to our benefit.

haranrk commented 3 years ago

Hey @codeskings, it's not problem at all :)

the inference is used to predict the training images in the (predict.py). I think this is done for illustration purpose since they are already (or part of them) is exposed to the training model. To test the model, I should run the (predict.py) with the validation data. Correct?

Yes

Which is better to calculate jaccard index: at the final stage on the whole generated prediction of all the tissue image, or at each patch with the corresponding prediction then sum them up?

If the patches are overlapping, you cannot calculate the jaccard index on a patch-by-patch basis because the averaging is done on the entire image after patch-by-patch prediction.

Count-map variable in the inference, what is its role exactly? As far as I understood that it is an indication of how accurate our approximate segmentation (that are done at a lower resolution) is? Since it counts how many zeros (black pixels) in the prediction. Correct? Why we are dividing the prediction by it?

No, it calculates how many times the algorithm has "seen" a particular pixel. It's basically averaging. When the algorithm goes over each patch, it sums the overlapping regions. The count_map is then divided and averaged.

Roi_masking is used to eliminate or keep the white spaces in the mask. If the ROI-masking is true then the strided mask will more likely to be same as the mask itself, while if it false then it replace the mask value at each stride with one, which cause white spaces to appear. In this case I think keeping it true is better. Correct?

Can you let me know which region of the code you are referring to? If you are referring to the tissue mask at the beginning, it's used to filter out the glass slide pixels from being fed into the prediction pipeline.

What is (ov-im-stride) and (ov_im) variables refer to? What are their significance? why we added masked image and the scaled tissue image together? Why dividing by 2?
ov_im = mask_im / 2 + im_im / 2
ov_im_stride = st_im / 2 + im_im / 2



`ov_im` - The mask overlaid on the input image, so we know the prediction is properly coinciding with the input image
`ov_im_stride` - Same as above but input image is overlayed with the stride mask 
These are just for diagnostic purposes and not relevant to the actual prediction pipeline. 

>   After the prediction, what does this statement do? Why 128?
`                wsi_img = image_patches[j] * 128 + 128
`

The output from the prediction has a range [0,1], but to be able to view it as an image, the range should be [0,255]. 
>   I understood that the **scaling prediction** is done after projecting back to the lower resolution, but I didn’t understand what is the **Thresholding prediction**.

The raw output from the model is a probability map, it must be thresholded to get a black and white image.

codeskings commented 3 years ago

Thank you very much, really appreciated it.

codeskings commented 3 years ago

Dear Sir, I have a quick question, I calculated the Jaccard index between the original ground truth mask (I read the mask image at level 0 using ReadWholeSlideImage function) and the generated mask after thresholding (prd_im_fll_dict) but I got low values, therefore I reinvestigated the min and max values of each of them and I am surprised that the max value in the original ground truth is 3 instead of 1. Can you explain why?

koriavinash1 / DigitalHistoPath

steps to run the experiment from A to Z #19