pwesp / random-forest-polyp-classification

Random Forest to Predict the Histopathological Class (Benign vs. Premalignant) of Colorectal Polyps in 3D CT Colonography Images using Radiomics Features
13 stars 4 forks source link

request about training_features.csv file??? #1

Closed DrraBL closed 3 years ago

DrraBL commented 3 years ago

Hi @pwesp I am interested to your work. I wanted to extract radiomic features from the whole dataset. as you said in your repository that the first step is to create csv files contains the path to images and their segmentation, however the files in data folder contains the label and the feature. i have my own data which contains 2 folders, the first one consists on nrrd files for images and the second one refers to the true mask with nrrd format also. can you please tell me how i can create the csv files to extract radiomic features. about the labels, i have six classes so i think the csv file wil be containing the ground truth labels of these six classes, that's right or i need to have binary classes because i want to classify between benign and malignant for prostate cancer. I appreciate your help. Looking hearing from you soon. Regards,

pwesp commented 3 years ago

Hi @dardaa,

I am excited to hear about your interest in our work and thank you for pointing out an issue in this repository. You are right, the example files you mentioned were stored in the wrong place. I revisited the artificial example data sets for training and testing and restructured the code around them a little bit. The instructions were updated accordingly.

Hopefully you are able to follow the instructions now without problems and can continue working with our code, it's meant to be shared!

Best, Phil

DrraBL commented 3 years ago

HI @pwesp That's huge help for me to update your repository. after your update I could reproduce same result as yours in jupyter notebook but the results are different compared to the result in your paper. can you tell me the reason? otherwise you didn't reply me for my above question. I tried to create csv file contains the paths of files but i obtained an empty csv file, could you please correct me the following portion of code. `import os import pandas as pd

home_path= '/dataset/prostate/' train_images = home_path+'data/' train_segmentations = home_path+'label/' files_in_train = sorted(os.listdir(train_images)) files_in_segmentations = sorted(os.listdir(train_segmentations)) images=[home_path+'data/' + i for i in files_in_train if i.split("image")[0]+"label.nrrd" in files_in_segmentions] for i in files_in_segmentations: files_in_segmentations.remove(i) files_in_segmentations.append(home_path+'label/' + i) df = pd.DataFrame(data={"images":images,"labels":files_in_segmentations}) df.to_csv('files_path.csv', sep=',', index=False) df ` please note that for example in data folder i have prostate_0_image.nrrd and in segmentations fordel, i have prostate_0_label.nrrd.

looking to hearing from you. kind regards,

pwesp commented 3 years ago

Hi @dardaa,

that's great news and I'm delighted to hear that you made progress. Regarding your question about not being able to reproduce the results in our paper, this repository shares the code of our work, not the data. Only exemplary data in the form of random numpy arrays is provided so that the functionality of the code can be tested. Unfortunately, the polyp data cannot be made public due to data protection regulations.

However, if you have CT colonography scans of colorectal polyps and the according polyp segmentation masks at hand, you can use the model provided in 'trained_models/random_forest_polyp_classification_model.joblib', which was trained to classify polpys as benign or premalignant, to create your own results on your own data.

If you wish to train a random forest classifier for a different task (it seems like you are working with prostate images) you can follow the instruction steps 0. to 3.

Best, Phil

DrraBL commented 3 years ago

Hi @pwesp thank you for your feedback. Effectively, I am using t2 MRI images for prostate cancer not CT images so i don't know if it's possible to test your work. My work consist of using radiomic features with the ground truth offered by the dataset vendor to train a CNN model in order to classify between benign and malignant. Please can you tell me what's the bug in my code above to generate csv file contains paths to the images and their segmentation?

Best regards,