jspunda / prostatex

This is the repository of the final project for the course Intelligent Systems in Medical Imaging 2017
14 stars 7 forks source link

How to handle multiple similar DICOM series per patient? #50

Closed jspunda closed 7 years ago

jspunda commented 7 years ago

As can be seen from the data set overview page on the wiki, it's possible that we have multiple, similar DICOM series per patient. Some of which carry the exact same DICOM series description, while others have almost the same DICOM series descriptions.

My question is how do we want to handle this? Currently, for simplicity's sake we dismiss any duplicate DICOM series or just take the series that has the highest series number.

However, we are throwing away possibly valuable data. On the other hand, maybe these duplicate DICOM series are too blurry or too distorted to use and it would only negatively impact our performance.

Ideally, you would want to go through the entire set and for each duplicate, hand pick the ones that you want to keep, but I don't think this is feasible, given the amount of DICOM series and perhaps our lack of expert knowledge.

As for the test set, if we encounter multiple, similar DICOM series for a lesion in a certain patient, we could for example classify both of them and somehow average them. The same tradeoff between possible data loss and negatively impacted performance applies here though.

Note that if we do want to take into account every DICOM series we find for a lesion in a patient, our .hdf5 representation and lesion extraction requires some refactoring to support this properly.

henkjanhuisman commented 7 years ago

I thought we discussed this before. Imagine an MRI study with two series named T2-weightedTransbla. Then the first was probably not OK, because of artifacts, blurriness or anything else. The second, or more generally, the last is nearly always the best. So the recipe is: retain the last series. The series number in the image.csv or DICOM is the indicator to use here. Retain the duplicate series with the highest series number.

On Wed, May 24, 2017 at 3:17 PM, jspunda notifications@github.com wrote:

As can be seen from the data set overview page on the wiki (insert link), it's possible that we have multiple, similar DICOM series per patient. Some of which carry the exact same DICOM series description, while others have almost the same DICOM series descriptions.

My question is how do we want to handle this? Currently, for simplicity's sake we dismiss any duplicate DICOM series or just take the series that has the highest series number.

However, we are throwing away possibly valuable data. On the other hand, maybe these duplicate DICOM series are too blurry or too distorted to use and it would only negatively impact our performance.

Ideally, we you would want to go through the entire set and for each duplicate, hand pick the ones that you want to keep, but I don't think this is feasible, given the amount of DICOM series and perhaps our lack of expert knowledge.

As for the test set, if we encounter multiple, similar DICOM series for a lesion in a certain patient, we could for example classify both of them and take the average. The same tradeoff between possible data loss and negatively impacted performance applies here though.

Note that if we do want to take into account every DICOM series we find for a lesion in a patient, our .hdf5 representation and lesion extraction requires some refactoring to support this properly.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jspunda/prostatex/issues/50, or mute the thread https://github.com/notifications/unsubscribe-auth/AawFqbolfGALVNJPsBfuHjTA-oKF33VJks5r9C3xgaJpZM4NlF7R .

jspunda commented 7 years ago

Thank you for the quick reply. This has indeed been discussed before, but at that point in time I was not aware of cases like for example patient 0191. In this case we have the following DICOM series with ADC in its description:

ADC_S3_1 ADC_S3_2 ep2d_diff_tra2x2_Noise0_FS_DYNDIST_ADC ep2d_diff_tra2x2_Noise0_NoFS_DYNDIST_ADC

Which ADC should we take in this case? Can we still use the series number as an indicator of what is the best ADC to take, even though they have different DICOM descriptions?

henkjanhuisman commented 7 years ago

These are experimental ADC series. Is there a regular ADC series in there? Then skip the other ADCs, otherwise, pick one.

On Wed, May 24, 2017 at 4:16 PM, jspunda notifications@github.com wrote:

Thank you for the quick reply. This has indeed been discussed before, but at that point in time I was not aware of cases like for example patient 0191. In this case we have the following DICOM series with ADC in its description:

ADC_S3_1 ADC_S3_2 ep2d_diff_tra2x2_Noise0_FS_DYNDIST_ADC ep2d_diff_tra2x2_Noise0_NoFS_DYNDIST_ADC

Which ADC should we take in this case? Can we still use the series number as an indicator of what is the best ADC to take, even though they have different DICOM descriptions?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jspunda/prostatex/issues/50#issuecomment-303737257, or mute the thread https://github.com/notifications/unsubscribe-auth/AawFqV8kpp0ueE1mT8RjtSDe7Hyu26M9ks5r9DungaJpZM4NlF7R .

jspunda commented 7 years ago

That is pretty much already how we handle these cases now, but it's good to have this clarified completely. This issue can be closed now.