Open codingcode111 opened 3 years ago
Hi @codingcode111, can you kindly share with us which script you are running and what are your parameters?
I am running points_extractor.py, and using PAIP2019 training dataset. However, I getting the following error
The points_extractor.py
requires two arguments:
train
or valid
viable
or whole
, the two tumor types present in the PAIP datasetYou can run the point_extract_script.sh
present in the same directory to automatically run the points_extractor.py
for each permutation of arguments. You can run bash script using the following command:
bash point_extract_script.sh
Don't forget to pull the latest changes before proceeding. Kindly write to us if you have any further issues.
Hi Haran,
Thank you so much for your reply and help. I pulled the latest changes and placed the 50 training unzip files in the data folder. By running the command above: raw-data, train and patch-coords-200k folders have been created with empty files and the following error has been occurred. [cid:6C3B0BD4-CABE-422E-B3B9-4F5DC7A5A9C8-L0-001] Any suggestions please?. We really appreciate your clarification and cooperation.
The error in questions seems to occur because of the algorithm is unable to find the files. The directory of data is hardcoded into the script. You may have to change it based on where your files are.
Also, don't forget to refer this issue to resolve file format errors.
Hi Haran,
Thank you so much for your reply and help. I was extremely busy last week. Regarding the project, I have tried to change the directory and did put the data file in the project directory, so both are in the same directory. However, I’m still getting the same error. Any suggestion to resolve this issue please?. Your suggestions and support would be highly appreciated.
Hi Haran,
Thank you for your help. However, I tried to convert tiff files into pyramids tiff using the command you provided. But, I’m not sure where exactly I need to use it Does it need to be in the same directory that has the data folder Or I can call the folder path that has the entire tiff images. I am new to histopathology images and struggling to sort out the initial steps to run out this experiment successfully. Your help and reply would be highly appreciated.
Best regards,
convert input -compress jpeg -quality 90 -define tiff:tile-geometry=256x256 ptif:output
input
- Complete file path to input imageoutput
- Complete file path to output image Hi Haran,
I might need to let you know that I only downloaded the 50 training PAIP2019 dataset and unzip them. They are 50 folder, each folder represents training phase number and contains svs, xml, tiff_viable and tif_whole. Ex:ex:Training_phase_1_006 contains svs, xml, viable_tif and whole_tif. I also, extracted the 4 images in new folder and got 200 items in the folder. I am not sure if you are using the same dataset format that I am using. I hope you got what I mean and sort out this problem please. Your guidance and support would be highly appreciated.
Best regards,
Hi,
That is the exact same dataset that I used as well. The tiff_viable and the tiff_whole are the images that are not in openslide format. Therefore, you would need to convert those images to openslide format. To do that you can use the command mentioned in the previous comment above. For example, if you had a file titled Ex:ex:Training_phase_1_006.tif
, use the following command in the folder it's present in.
convert Training_phase_1_006.tif -compress jpeg -quality 90 -define tiff:tile-geometry=256x256 ptif:Training_phase_1_006.tiff
Hi Haran, Thank you so much for your reply. I did convert the images into pyramid tiff. Now, I have a data folder with data name. Also, there is a raw-data folder inside it. Where should I placed the pyramid tiff images? because when I tried to place the tiff images under both the data and data-raw folders, and I got the following error in the both tries:
File "points_extractor.py", line 454, in batch_patch_gen
image_path = glob.glob(os.path.join(data_path,id,'*.svs'))[0]
IndexError: list index out of range
Please let me know where should I put the svs files and the exact order of the data and images including the SVS files that are included inside each training file the the 50 files that are available by the by the challenge website. By converting the images into pyramid images,I got 6 images inside each training folder. Should I use the 50 training folders under the data folder or should I only use the pyramid images. I am a bit confused as it is not clear where should we place the data and each type and folder of data should be placed. We need more clarification please to successfully apply this work . Thank you again Haran so much for your support and help.
Did you change the data_path
variable in the points_extractor.py
script to where your data currently is present?
I put the data under the data folder in the same directory. In the script, it’s written( data, raw-data,..etc). So, I did put the tiff files in the data folder. I also, tried to put them in raw_data and haven’t made any change in the script as I think there is no full path in the script, only data and raw-data folders which are already presented in the same directory under DigitalHistopath folder.
After converting the binary mask images into the pyramidal format, and start running the script of "points_extractor", I am facing a memory error when using deep copy. Before throwing this error, it only save the coordinates of the patches extracted from the first training svs file. Can you please advise me if the problem is with my machine or not? I am using 32GB RAM. Thank you in advance.
@codingcode111 In the script I have the following declaration
data_path = os.path.join('..','..','data','raw-data','train')
The ..
represents the parent directory, so in my setup, I have the following structure.
DigitalHistoPath/code_cm17/patch_extraction/points_extractor.py
DigitalHistoPath/data/raw-data/train
However, I advise you to change the data_path
and out_path
and other such variables to suit your needs.
@heba9004 The issue warrants more attention. Can you post the error message in the new issue page here (https://github.com/koriavinash1/DigitalHistoPath/issues/18#issue-777832220)? Specifically, I would like to know at which line of the code, the memory error ocurred.
Thank you so much Haran for your reply. Unfortunately, there is no clear instructions and layout regarding the dataset as we are working on WSI and have different format in our dataset( svs, tiff, xml), we need more clarification regarding the data organization before working on the project. In the following image for example, it’s clearly stated the files order and how exactly the data organised. Is it possible to share any similar layout to this project please?
Also, if @heba9004 applied this project successfully can you kindly share with me your layout and initial steps please. Your reply and help would be highly appreciated.
The below file contains depicts the directory structure of our repository as is at the time of submission to the PAIP 2019 grand challenge.
The important directory is the data/raw-data/train
directory. The directory here shows only the files of only one sample (Training_phase_1_004
), but the other sample directories are of the same format.
I have also included a new script convert_to_pyramidal.py
under the patch_extraction
folder which would help you with converting the mask files to pyramidal format.
@codingcode111 , sorry I just noticed your question, I converted the images and place them in their original folders, thus each folder in training have the original and converted tif images, but I am still facing memory error that I try to fix. Hope this helps you.
Hi, I would like to thank you for sharing this interesting work. However, I am trying to apply this experiment on the same data that you use, it shows me that tumor type and mode required. Can you kindly tell me how can I fix this issue please?. Your reply and help would be highly appreciated.
Kind regards,