Implementation - Githubissues

codingcode111 commented 3 years ago

Hi, I would like to thank you for sharing this interesting work. However, I am trying to apply this experiment on the same data that you use, it shows me that tumor type and mode required. Can you kindly tell me how can I fix this issue please?. Your reply and help would be highly appreciated.

Kind regards,

haranrk commented 3 years ago

Hi @codingcode111, can you kindly share with us which script you are running and what are your parameters?

codingcode111 commented 3 years ago

I am running points_extractor.py, and using PAIP2019 training dataset. However, I getting the following error

codingcode111 commented 3 years ago

Screenshot from 2020-12-13 19-25-28

haranrk commented 3 years ago

The points_extractor.py requires two arguments:

Mode - train or valid
Tumor Type - viable or whole, the two tumor types present in the PAIP dataset

You can run the point_extract_script.sh present in the same directory to automatically run the points_extractor.py for each permutation of arguments. You can run bash script using the following command:

bash point_extract_script.sh

Don't forget to pull the latest changes before proceeding. Kindly write to us if you have any further issues.

codingcode111 commented 3 years ago

Hi Haran,

Thank you so much for your reply and help. I pulled the latest changes and placed the 50 training unzip files in the data folder. By running the command above: raw-data, train and patch-coords-200k folders have been created with empty files and the following error has been occurred. [cid:6C3B0BD4-CABE-422E-B3B9-4F5DC7A5A9C8-L0-001] Any suggestions please?. We really appreciate your clarification and cooperation.

haranrk commented 3 years ago

The error in questions seems to occur because of the algorithm is unable to find the files. The directory of data is hardcoded into the script. You may have to change it based on where your files are.

Also, don't forget to refer this issue to resolve file format errors.

codingcode111 commented 3 years ago

Hi Haran,

Thank you so much for your reply and help. I was extremely busy last week. Regarding the project, I have tried to change the directory and did put the data file in the project directory, so both are in the same directory. However, I’m still getting the same error. Any suggestion to resolve this issue please?. Your suggestions and support would be highly appreciated.

haranrk commented 3 years ago

Can you post the entire output after you run the program?
Did you convert the mask files to pyramidal tiff as outlined in this issue?

codingcode111 commented 3 years ago

Hi Haran,

Thank you for your help. However, I tried to convert tiff files into pyramids tiff using the command you provided. But, I’m not sure where exactly I need to use it Does it need to be in the same directory that has the data folder Or I can call the folder path that has the entire tiff images. I am new to histopathology images and struggling to sort out the initial steps to run out this experiment successfully. Your help and reply would be highly appreciated.

Best regards,

haranrk commented 3 years ago

convert input -compress jpeg -quality 90 -define tiff:tile-geometry=256x256 ptif:output

input - Complete file path to input image
output - Complete file path to output image

codingcode111 commented 3 years ago

Hi Haran,

I might need to let you know that I only downloaded the 50 training PAIP2019 dataset and unzip them. They are 50 folder, each folder represents training phase number and contains svs, xml, tiff_viable and tif_whole. Ex:ex:Training_phase_1_006 contains svs, xml, viable_tif and whole_tif. I also, extracted the 4 images in new folder and got 200 items in the folder. I am not sure if you are using the same dataset format that I am using. I hope you got what I mean and sort out this problem please. Your guidance and support would be highly appreciated.

Best regards,

haranrk commented 3 years ago

Hi, That is the exact same dataset that I used as well. The tiff_viable and the tiff_whole are the images that are not in openslide format. Therefore, you would need to convert those images to openslide format. To do that you can use the command mentioned in the previous comment above. For example, if you had a file titled Ex:ex:Training_phase_1_006.tif, use the following command in the folder it's present in.

convert Training_phase_1_006.tif -compress jpeg -quality 90 -define tiff:tile-geometry=256x256 ptif:Training_phase_1_006.tiff

codingcode111 commented 3 years ago

Hi Haran, Thank you so much for your reply. I did convert the images into pyramid tiff. Now, I have a data folder with data name. Also, there is a raw-data folder inside it. Where should I placed the pyramid tiff images? because when I tried to place the tiff images under both the data and data-raw folders, and I got the following error in the both tries:

File "points_extractor.py", line 454, in batch_patch_gen
    image_path = glob.glob(os.path.join(data_path,id,'*.svs'))[0]
IndexError: list index out of range

Please let me know where should I put the svs files and the exact order of the data and images including the SVS files that are included inside each training file the the 50 files that are available by the by the challenge website. By converting the images into pyramid images,I got 6 images inside each training folder. Should I use the 50 training folders under the data folder or should I only use the pyramid images. I am a bit confused as it is not clear where should we place the data and each type and folder of data should be placed. We need more clarification please to successfully apply this work . Thank you again Haran so much for your support and help. Screenshot from 2021-01-01 12-44-10

haranrk commented 3 years ago

Did you change the data_path variable in the points_extractor.py script to where your data currently is present?

codingcode111 commented 3 years ago

I put the data under the data folder in the same directory. In the script, it’s written( data, raw-data,..etc). So, I did put the tiff files in the data folder. I also, tried to put them in raw_data and haven’t made any change in the script as I think there is no full path in the script, only data and raw-data folders which are already presented in the same directory under DigitalHistopath folder.

heba9004 commented 3 years ago

After converting the binary mask images into the pyramidal format, and start running the script of "points_extractor", I am facing a memory error when using deep copy. Before throwing this error, it only save the coordinates of the patches extracted from the first training svs file. Can you please advise me if the problem is with my machine or not? I am using 32GB RAM. Thank you in advance.

haranrk commented 3 years ago

@codingcode111 In the script I have the following declaration

data_path = os.path.join('..','..','data','raw-data','train')

The .. represents the parent directory, so in my setup, I have the following structure.

DigitalHistoPath/code_cm17/patch_extraction/points_extractor.py
DigitalHistoPath/data/raw-data/train

However, I advise you to change the data_path and out_path and other such variables to suit your needs.

haranrk commented 3 years ago

@heba9004 The issue warrants more attention. Can you post the error message in the new issue page here (https://github.com/koriavinash1/DigitalHistoPath/issues/18#issue-777832220)? Specifically, I would like to know at which line of the code, the memory error ocurred.

codingcode111 commented 3 years ago

Thank you so much Haran for your reply. Unfortunately, there is no clear instructions and layout regarding the dataset as we are working on WSI and have different format in our dataset( svs, tiff, xml), we need more clarification regarding the data organization before working on the project. In the following image for example, it’s clearly stated the files order and how exactly the data organised. Is it possible to share any similar layout to this project please? 01790DE9-9754-40F5-9949-9CB3628A7BBD

Also, if @heba9004 applied this project successfully can you kindly share with me your layout and initial steps please. Your reply and help would be highly appreciated.

haranrk commented 3 years ago

The below file contains depicts the directory structure of our repository as is at the time of submission to the PAIP 2019 grand challenge.

dir-structure.txt

The important directory is the data/raw-data/train directory. The directory here shows only the files of only one sample (Training_phase_1_004), but the other sample directories are of the same format.

I have also included a new script convert_to_pyramidal.py under the patch_extraction folder which would help you with converting the mask files to pyramidal format.

heba9004 commented 3 years ago

@codingcode111 , sorry I just noticed your question, I converted the images and place them in their original folders, thus each folder in training have the original and converted tif images, but I am still facing memory error that I try to fix. Hope this helps you.

koriavinash1 / DigitalHistoPath

Implementation #16