marksgraham / OCT-Converter

Tools for extracting the raw optical coherence tomography (OCT) and fundus data from proprietary file formats.
https://pypi.org/project/oct-converter/
MIT License
199 stars 72 forks source link

improper use of PaddedString in laterality data structure replaced w/ uint8 #46

Closed Dbrown411 closed 2 years ago

Dbrown411 commented 2 years ago

Addresses issue #45 and maybe partially #28

marksgraham commented 2 years ago

Thanks for this Dillon!

billyk18278 commented 2 years ago

Hi Mark and Dillon commit 8f25782 does not solve the issue completely. Although self.laterality is extracted correctly in L260 ( fundus_images.append(FundusImageWithMetaData(image=image, patient_id=key, laterality= self.laterality))

only the laterality of the last image is saved in all images. (of course the same for volumes)
Unfortunately i found this after 130GB of E2E being extracted ...

my fix is something like this but i am sure you can make it look better since i do not know python.

     image_array_dict[image_string] = {"blob":image,"laterality":self.laterality}
fundus_images = []
for key, image_array_dict_value in image_array_dict.items():
     fundus_images.append(FundusImageWithMetaData(image=image_array_dict_value["blob"], patient_id=key, laterality= image_array_dict_value["laterality"]))
marksgraham commented 2 years ago

Hi,

Thanks for spotting the bug. My worry with this solution is I can't see any way to guarantee that each laterality you extract is associated with the correct fundus, as there is no patient ID we can extract for the laterality. I think your solution assumes that the laterality info + it's associated fundus data live in adjacent chunks. This may well be the case but I don't know - does the laterality data you extract using your altered code look correct to you?

Mark

billyk18278 commented 2 years ago

Hi Mark,

May i start by saying that i did not review all e2e.py but i tried to find a quick fix, so maybe i am missing something. My first 2 tests (~20 pngs ) for fundus photos have correct laterality since we reviewed them one by one.
To my experience (bulk exporting ~900 patients(meaning all their OCTs and not each scan seperately), each generated E2E corresponds to a certain patient but some patients have more than one E2E corresponding to them (this can be accessed via heidelberg's export log).
After sending you this comment i tried to make a similar thing for volumes. What I noticed (or i think i noticed at least) is that sometimes the chunk with laterality information comes after the image data, that means that when OCTvolumeWithMetaData is stored in L222 i don't have the laterality. I made a dirty workaround to create a seperate object to store volumestring=>laterality and assuming there is at least one corresponding volume after the header chunk with the same volumesting i can identify the laterality of the volume. (this seems to work in my first examples, still we will review all of our data later on). Even with this i have some volume's where laterality is not defined at all.
Another issue i am facing is that i can not extract the volume date. For me it is critical since my E2Es have several examinations that may be of different dates.
Do you have also a way to get the Patient_id stored in the E2E? The field you call patient_id is the primary key in heidebergs database not the field that the user is setting for a patient.
In any case i think i can share one of my anonymized E2Es to you if you are willing to take a look on these issues.
Best Vassilis

marksgraham commented 2 years ago

Hi there,

Please do share an anonymized E2E if you can; I'll see if it is possible to extract the actual patient id and volume data. Its very useful that you have 'ground truth' from the heidelberg system for us to compare against!

Best, Mark

On Thu, Jun 2, 2022 at 7:30 AM billyk18278 @.***> wrote:

Hi Mark,

May i start by saying that i did not review all e2e.py but i tried to find a quick fix, so maybe i am missing something. My first 2 tests (~20 pngs ) for fundus photos have correct laterality since we reviewed them one by one. To my experience (bulk exporting ~900 patients(meaning all their OCTs and not each scan seperately), each generated E2E corresponds to a certain patient but some patients have more than one E2E corresponding to them (this can be accessed via heidelberg's export log). After sending you this comment i tried to make a similar thing for volumes. What I noticed (or i think i noticed at least) is that sometimes the chunk with laterality information comes after the image data, that means that when OCTvolumeWithMetaData is stored in L222 i don't have the laterality. I made a dirty workaround to create a seperate object to store volumestring=>laterality and assuming there is at least one corresponding volume after the header chunk with the same volumesting i can identify the laterality of the volume. (this seems to work in my first examples, still we will review all of our data later on). Even with this i have some volume's where laterality is not defined at all. Another issue i am facing is that i can not extract the volume date. For me it is critical since my E2Es have several examinations that may be of different dates. Do you have also a way to get the Patient_id stored in the E2E? The field you call patient_id is the primary key in heidebergs database not the field that the user is setting for a patient. In any case i think i can share one of my anonymized E2Es to you if you are willing to take a look on these issues. Best Vassilis

— Reply to this email directly, view it on GitHub https://github.com/marksgraham/OCT-Converter/pull/46#issuecomment-1144488202, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4UIM3E3MJ74Q3W7JWBWADVNBIH5ANCNFSM5M3NCIJQ . You are receiving this because you modified the open/close state.Message ID: @.***>

billyk18278 commented 2 years ago

Let me know if you got the files, Patient_id field and examination date are the most critical information but also correct laterality and scan type are useful, especially in bulk export procedures that you do not review each frame manually.

marksgraham commented 2 years ago

Thanks files received! I'm a bit backed up with work atm so it may be a little while before I get a chance to look at them, I'll let you know when I do. Mark

On Wed, Jun 8, 2022 at 12:05 AM billyk18278 @.***> wrote:

Let me know if you got the files, Patient_id field and examination date are the most critical information but also correct laterality and scan type are useful, especially in bulk export procedures that you do not review each frame manually.

— Reply to this email directly, view it on GitHub https://github.com/marksgraham/OCT-Converter/pull/46#issuecomment-1149465477, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4UIMYH4NOLFDGGQAYX3Y3VOASZLANCNFSM5M3NCIJQ . You are receiving this because you modified the open/close state.Message ID: @.***>

marksgraham commented 2 years ago

Hey,

Could you email me directly (my email can be found on my github profile) so I can ask you a view questions about the data you've sent?

Cheers, Mark

On Fri, Jun 10, 2022 at 8:56 AM Mark Graham @.***> wrote:

Thanks files received! I'm a bit backed up with work atm so it may be a little while before I get a chance to look at them, I'll let you know when I do. Mark

On Wed, Jun 8, 2022 at 12:05 AM billyk18278 @.***> wrote:

Let me know if you got the files, Patient_id field and examination date are the most critical information but also correct laterality and scan type are useful, especially in bulk export procedures that you do not review each frame manually.

— Reply to this email directly, view it on GitHub https://github.com/marksgraham/OCT-Converter/pull/46#issuecomment-1149465477, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4UIMYH4NOLFDGGQAYX3Y3VOASZLANCNFSM5M3NCIJQ . You are receiving this because you modified the open/close state.Message ID: @.***>