FrancescVP commented 1 year ago

Upgrading data processing by:

Providing individual segmentation from the organs in nifti files, this speeds the modeling process since we'll passing the model small images ( from (512, 512, 700) to (<100, <100, <100) ). This makes the model x50 faster.
Also added from function to control cache trash and making temporals.
Fixed some issues from the dataloader and making it suitable for the new data format.

mvivopas commented 1 year ago

Considerations

Reducing image into exclusive organ segmentation

We should take this into consideration, we are exclusively using the segmented organ to predict injury. Other organs, the full picture could play an important role in this prediction, which we are now going to dismiss. It would be a good idea to do an experiment once we have the evaluation framework to test where if this information plays a role.

Cache

@FrancescVP I will need a brief explanation about what the get_args in the preprocessor class is exactly doing.

TODO 1: Organs with multiple segmentations

There are some organs, such as the kidney, that presents two segmentations (one for the right another for the left). We still have to decide how to deal with this situation

TODO 2: Different organs present different number of classes

Organs can present one of the following sets of labels:

organ_healthy, organ_low, organ_high
organ_healthy, organ_injury

we will have to adapt also the trainer to adapt the number of classes depending on the number of labels returned by the data loader.

But amazing PR bro really cool stuff, looking forward you explain it to me with more detail.

FrancescVP commented 1 year ago

RE Considerations

Reducing image into exclusive organ segmentation

The first thing we should do is to get a better understanding of the images since this will answer this kind of questions. We should do a proper study of how injured organs look like and how they affect its environment. It would be useful to analyze if injured organs are larger than healthy, we can do it by extracting the volume of the organs from the segmentation files and the normalizing it by some healthy organ, like the liver, to extract the expected variability of the organ volume.

Cache

The get_args function just loads the arguments from the json file in an efficient way when doing it multiple times. Rather it is true that we just load the file once then it's a bit useless jeje :)

TODO 1: Organs with multiple segmentations

Not a problem, single-organ models will have only one channel (its image) while two-organ moldels will have two channels (both kidneys) --> (1, 112, 112, 112) vs (2, 112, 112, 112)

TODO 2: Different organs present different number of classes

When building the model we'll specify the number of classes that the model should expect. We can build a dict variable that can manage this issue.

TODO 3: Data normalization

This is a multi-centric study, where image come from multiple sites where different adquisition protocols are applied, resulting on a variety of image intensities. Apart from that, some centers acquire larger images where all organs are covered while others no. This issue have been adressed in the following notebook.

One approach that we can adopt, as a first measure, is in the dataloader, in the transformation section, we can use the IntensityNomalization function, providing the mean hu and the std hu to normalize the images.

TODO 4: Computer efficiency

When finishing the preprocessing process, we should consider saving the images in .npy format instead of .nii.gz since its more computationally efficient in terms of model training. Nifti format is better when doing preprocessing, once finished it's better to migrate to another format.

mvivopas / RSNA-2023

Data Processing Update #8

Considerations

Reducing image into exclusive organ segmentation

Cache

TODO 1: Organs with multiple segmentations

TODO 2: Different organs present different number of classes

RE Considerations

Reducing image into exclusive organ segmentation

Cache

TODO 1: Organs with multiple segmentations

TODO 2: Different organs present different number of classes

TODO 3: Data normalization

TODO 4: Computer efficiency