vanAmsterdam / lidc-binary-classification

This repository contains code to pre-process the LIDC-IDRI dataset of CT-scans with pulmonary nodules into a binary classification problem, easy to use for learning deep learning
MIT License
32 stars 12 forks source link

Division of data & Nodule location #4

Open aolakunleabayomi opened 3 years ago

aolakunleabayomi commented 3 years ago

Hello friend, I want to ask if you know whether convolutional neural network treats npy file like the conventional jpg file?

Which of your ipynb file divides the datasets into train, test & validation datasets?

I'm using this preprocessing code for a localization problem. Please how do I get the size of the nodule in each of the npy file?

Lastly, I was able to derive the nodulesRGB-unsorted files but it failed when trying to classify into malignant & benign datasets in nodules2D folder. Any way out? Or will what I have in nodule3D be sufficient for my work?

Thanks for your help.

vanAmsterdam commented 3 years ago

Hi,

In the end, CNNs only work on numbers; jpg's need to be converted to numbers somehow (though I think dataloaders from several CNN libraries do that 'on-the-fly' for you so you don't have to write extra code). the npy files are already converted to numbers. There is obviously a difference between the 3D nodules (as they are 3D) and the 2D jpgs.

In 'prepare_data', the splits are made

To get the size, you need to look at the annotations. Each nodule comes with a binary mask with 1s for where the nodule is and 0s everywhere else. If you sum the mask (after resampling) for each nodule you get a representation of the size.

With respect to your task of localization, this may not be the right starting point for you. In this preprocessing, the localization is already done based on the metadata. This pre-processed data can be use for the task of classifying nodules, after localization.

Best of luck

aolakunleabayomi commented 3 years ago

Thank you. I understand better now from your helpful explanation.

You said this: In 'prepare_data', the splits are made. After the splitting, how do I link the localization information in the meta_data with each .npy file. You know that nodule_unsorted has all the .npy files before splitting just as meta_data has all info. Is there a part that splits the localization info in meta_data into that of benign and malignant separately?

Thank you