Questions about FileMunging

maple11041 commented 3 years ago

Hi, I have a little question about your code for the preprocessing. In your FileMunging.py, for every patient, you keep only 1 scan and choose the 5mm slice scan in priority . May I ask you why you chose the 5mm slice scan rather than other scan ( e.g. CT Plain, CT Plain 3mm .... etc.). Also, Why choosing only one scan instead of keeping all the scan of the patients? Sorry about that I have little medical background and might not know the difference between those scans. It would be nice if you can explain the detail about how you decided to do your preprocess. Thanks in advance!

jarodroland commented 3 years ago

Hi there, this is very much a dead project and was entirely a learning exercise to gain experience with machine learning algorithms and medical imaging data. For those reason, the choices you refer to were primarily to make data handling quicker and easier. Head CTs can come in a variety of shapes and sizes. This data set isn't entirely uniform. Sometimes the 5mm thick slices will have better signal to noise than the 1 mm thick slices due to volume averaging. Sometimes they are similar and just small image sets to easier to work with at the prototyping stage. Similarly, for choosing between one type of scan for consistency and easy of prototyping versus using all the data available to maximize information gleaned was again primarily driven by ease of working with the data during an educational course over a few days. Hope that is helpful, but would strongly recommend continuing to look at other more established projects for best practices, not this one.

maple11041 commented 3 years ago

Hi, thank you so much for your reply. Definitely very helpful! My one last concern is, since all types of each patient has the same label answer ( For example, CQ500CT420/ 'CT 0.625mm'. CQ500CT420/CT 5mm has the same label answer). If I want to use every type of scan in each patient, a more reasonable way to do it is to make sure that the types of scan which belongs to the same patient should not appear in training and validation set at the same time. (For example, if CQ500CT420/ 'CT 0.625mm is in training set, CQ500CT420/CT 5mmshould not appear in validation or testing set. ) or else I should choose only one type of scan in each patient to do the train test split. Am I right?

ibro45 commented 10 months ago

Thank you for providing the folder list, extremely time- and nerve- saving! For future reference, I believe that this follows what the dataset's authors did:

jarodroland / ConvOuch

Questions about FileMunging #1