Lesion visualization and understanding data better

Now that we are able to load lesion cutouts and we have started training, it might be a good idea to understand our data a little better. At the moment it feels like we are just 'throwing the lesions in a network' and hoping that the network picks up on what features are import to determine whether or not a lesion is clinically significant.

While that may eventually provide some results after lots of trial and error with parameter tweaking, it would still feel as if we don't understand the data that well. I think that if we try to get a better idea of what exactly makes a lesion clinically significant, we could tweak our network in a more targeted way, as opposed to 'random' trial and error.

Yesterday, during the meeting it was mentioned that for example in ADC images lower values (400-600) are indicative of a clinically significant lesion. That's why I decided to start there. In the data_visualization branch you can find an initial visualization function that plots lesions and their histograms. It supports saving to disk, but that takes a few minutes, so if you don't want to bother with that you can download the ADC plots from here: https://jspunda.stackstorage.com/s/6UN2Ds6s7kwJtqE (password: ismi2017)

I've looked at the plots myself and already I noticed that it is indeed true that if you see more pixels with an intensity between 400-600 it is more likely to be clinically significant. But there are of course a lot of 'strange' cases as well. Some not clinically significant lesions do have peaks at lower values like 500 and some clinically significant lesions don't have any pixels with values below 600.

So I'm not sure how useful it is to look at the histogram of the lesion, but I think it's a first step towards understanding the data a little better. Maybe someone else has some other ideas for plotting things in our data. I think we could also use some of the plots in our presentation next week.

jspunda / prostatex

Lesion visualization and understanding data better #29