marksgraham / OCT-Converter

Tools for extracting the raw optical coherence tomography (OCT) and fundus data from proprietary file formats.
https://pypi.org/project/oct-converter/
MIT License
195 stars 70 forks source link

Question about the structure of FDA Files (NOT AN ISSUE) #16

Closed dakenblack closed 3 years ago

dakenblack commented 3 years ago

Hi Mark,

Great work on this project. Let me preface this by saying I don't have an issue with the OCT-Converter project, I am currently working on a project to extract some patient data from FDA files and I'm having a hard time finding the structure of the data.

I am currently trying to extract some patient data (name, eye side etc) from FDA files. The data seems to be in a chunk with the tag PATIENT_INFO_03, the uocte page (https://bitbucket.org/uocte/uocte/wiki/Topcon%20File%20Format) doesn't have any documentation on this chunk (only PATIENT_INFO_02).

I have some FDA files and some exported data (using Topcon's OCTDataCollector.exe) and doing a brute force search doesn't yield any matches either. I feel the data is encrypted but I can't be too sure.

My reason for posting here is that I'm hoping you might have come across this and know something about it.

Jabez

marksgraham commented 3 years ago

Hi Jabez,

We've had a bit of discussion about this on issue #13 - have you tried looking in @FDA_FILE_INFO for patient name etc? I think laterality is encoded in the first byte of, @CAPTURE_INFO_02 too.

@antoniohupa did you have any luck finding these fields in the .fda? If you could share some code that would be great - I could incorporate into the main package.

Mark

antoniohupa commented 3 years ago

Hi Mark and Jabez

Yes, I could extract that information from FDA files (patient_id, eye, date of capture, etc). Patient's name is also easy to extract but I didn't dot it cause I need to work with anonymized data. I'm too busy these days and I must be focused in other projects right now but as soon as I can I will share with you my code.

A

dakenblack commented 3 years ago

Hi Mark and Antonio, Thanks for getting back to me. I will take a look at those chunks and see if I can find anything useful.

Looking forward to your code snippet as well.

Jabez

dakenblack commented 3 years ago

I had a look at the FDA_FILE_INFO chunk and this is what I see :

b'\x02\x00\x00\x00\xe0.\x00\x0010.1.5.48100\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

The number in the middle is the analysis software version, there is identification here.

I had a look at CAPTURE_INFO_02, I think you're right about the laterality. I need to confirm by looking at some more files though.

antoniohupa commented 3 years ago

Hi there

In @PATIENT_INFO_02 you will find the patient id, name, surname, gender, birth year, month, day ... In @CAPTURE_INFO_02 you will find the eye in the first byte (x; 0 - right, 1-left) and the capture year, month, day .. I'm not completely sure the code for extract the eye is totally ok but reviewing the images with the ophtalmologists, it seems to match.

I've introduced new structures in the FDA class from Mark and a couple of functions to extract data of interest (see below). I'm pretty sure that this code will not work with fda files from other topcon models different from 3D OCT Maestro, which is the one I'm using. I'll try to get some fda files from other topcon models to adapt this code to them.


class FDA(object): """ Class for extracting data from Topcon's .fda file format.

    Notes:
        Mostly based on description of .fda file format here:
        https://bitbucket.org/uocte/uocte/wiki/Topcon%20File%20Format

    Attributes:
        filepath (str): Path to .img file for reading.
        header (obj:Struct): Defines structure of volume's header.
        oct_header (obj:Struct): Defines structure of OCT header.
        fundus_header (obj:Struct): Defines structure of fundus header.
        chunk_dict (dict): Name of data chunks present in the file, and their start locations.
"""
def __init__(self, filepath):
    self.filepath = Path(filepath)
    if not self.filepath.exists():
        raise FileNotFoundError(self.filepath)
    self.header = Struct(
        'FOCT' / PaddedString(4, 'ascii'),
        'FDA' / PaddedString(3, 'ascii'),
        'version_info_1' / Int32un,
        'version_info_2' / Int32un
    )
    self.oct_header = Struct(
        'type' / PaddedString(1, 'ascii'),
        'unknown1' / Int32un,
        'unknown2' / Int32un,
        'width' / Int32un,
        'height' / Int32un,
        'number_slices' / Int32un,
        'unknown3' / Int32un,
    )

    self.oct_header_2 = Struct(
        'unknown' / PaddedString(1, 'ascii'),
        'width' / Int32un,
        'height' / Int32un,
        'bits_per_pixel' / Int32un,
        'number_slices' / Int32un,
        'unknown' / PaddedString(1, 'ascii'),
        'size' / Int32un,
    )

    self.fundus_header = Struct(
        'width' / Int32un,
        'height' / Int32un,
        'bits_per_pixel' / Int32un,
        'number_slices' / Int32un,
        'unknown' / PaddedString(4, 'ascii'),
        'size' / Int32un,
        # 'img' / Int8un,
    )

    self.patient_info = Struct(
        'Patient id' / PaddedString(32, 'u8'),
        'Patient given name' / PaddedString(32, 'utf8'),
        'Patient surname' / PaddedString(32, 'utf8'),
        'Zeros' / PaddedString(8, 'u8'),
        'Gender' / Int8un,
        'Birth year' / Int16un,
        'Birth month' / Int16un,
        'Birth day' / Int16un,
        'Birth year' / Int16un,
        'Zeros2' / PaddedString(502, 'ascii')
    )

    self.capture_date = Struct(
        'Eye' / Int8un,
        'y' / Int16un,
        'Zeros' / PaddedString(103, 'ascii'),
        'Year' / Int16un,
        'Month' / Int16un,
        'Day' / Int16un,
        'Hour' / Int16un,
        'Minute' / Int16un,
        'Second' / Int16un,
    )

    self.chunk_dict = self.get_list_of_file_chunks()

def get_list_of_file_chunks(self):
    """Find all data chunks present in the file.

    Returns:
        dict
    """
    chunk_dict = {}
    with open(self.filepath, 'rb') as f:
        # skip header
        raw = f.read(15)
        header = self.header.parse(raw)

        eof = False
        while not eof:
            chunk_name_size = np.fromstring(f.read(1), dtype=np.uint8)[0]
            if chunk_name_size == 0:
                eof = True
            else:
                chunk_name = f.read(chunk_name_size)
                chunk_size = np.fromstring(f.read(4), dtype=np.uint32)[0]
                chunk_location = f.tell()
                f.seek(chunk_size, 1)
                chunk_dict[chunk_name] = [chunk_location, chunk_size]
    print('File {} contains the following chunks:'.format(self.filepath))
    for key in chunk_dict.keys():
        print(key)
    return chunk_dict

def read_oct_volume(self):
    """ Reads OCT data.

        Returns:
            obj:OCTVolumeWithMetaData
    """

    if b'@IMG_JPEG' not in self.chunk_dict:
        raise ValueError('Could not find OCT header @IMG_JPEG in chunk list')
    with open(self.filepath, 'rb') as f:
        chunk_location, chunk_size = self.chunk_dict[b'@IMG_JPEG']
        f.seek(chunk_location) # Set the chunk’s current position.
        raw = f.read(25)
        oct_header = self.oct_header.parse(raw)
        volume = np.zeros((oct_header.height, oct_header.width, oct_header.number_slices))
        for i in range(oct_header.number_slices):
            size = np.fromstring(f.read(4), dtype=np.int32)[0]
            raw_slice= f.read(size)
            slice = decode(raw_slice)
            volume[:,:,i] = slice
    oct_volume = OCTVolumeWithMetaData([volume[:, :, i] for i in range(volume.shape[2])])
    return oct_volume

def read_patient_info(self):
    """ Reads Patient info

        Returns:
            patient name, surname, gender
    """

    if b'@PATIENT_INFO_02' not in self.chunk_dict:
        raise ValueError('Could not find OCT header @PATIENT_INFO_02 in chunk list')
    with open(filepath, 'rb') as f:
        chunk_location, chunk_size = self.chunk_dict[b'@PATIENT_INFO_02']
        f.seek(chunk_location) # Set the chunk’s current position.
        raw = f.read(615)
        patient_head = self.patient_info.parse(raw)

    return patient_head

def read_capture_date(self):
    """ Reads capture info

        Returns:
            eye and date of capture
    """

    if b'@CAPTURE_INFO_02' not in self.chunk_dict:
        raise ValueError('Could not find OCT header @CAPTURE_INFO_02 in chunk list')
    with open(filepath, 'rb') as f:
        chunk_location, chunk_size = self.chunk_dict[b'@CAPTURE_INFO_02']
        f.seek(chunk_location) # Set the chunk’s current position.
        raw = f.read(118)
        #num = int.from_bytes(raw, 'little')
        #out_hex = ['{:02X}'.format(b) for b in raw]
        date = self.capture_date.parse(raw)

    return date

Executing fda.read_patient_info() or fda.read_capture_date() you'll get what you need.

dakenblack commented 3 years ago

Hi Antonio, Thanks for that but unfortunately my FDA files do not have a "PATIENT_INFO_02" it has "PATIENT_INFO_03" and as far as I can tell the data in my file does not have the same format as yours. I can get the capture date and I think I can get the eye laterality but not the patient ID, which is pretty important.

Jabez

antoniohupa commented 3 years ago

Hi Jabez

The same happens to me when I try to parse that information from fda files from Topcon Triton instead of 3D Maestro. "PATIENT_INFO_03" seems to be very messy. I have some fda files from a Triton identified with patient id, name, etc. Having these information I'll try to find them in the bytes but unfortunately I cannot right now. In the meanwhile, could explore some more in the data?

dakenblack commented 3 years ago

Yea, the files I've got are from a Triton as well. I found that most of the data in that chunk is exactly the same as the data found in other files. I've got to verify this for sure but I'm pretty sure the FDA files (I've compared) hold data for different patients so I shouldn't expect it to be so similar.

Thanks for your help, any assistance would be greatly appreciated. I'll continue to look through other files as well.

antoniohupa commented 3 years ago

That's right, almost all code in that chunk is the same between patients. The only differences are found in the first 4-5 bytes:

Patient id, 399047:

@PATIENT_INFO03g\x02\x00\x00\xd2\x1bH"0\x196g\x0b\x8e <---- S\x90\xfe\xe6A\xcc\xab8\x9c\x0c\x8a\x023\xae\x11\xd0\x19\xc1\x0eL\xdc\x908\xd8\x1c\xe4I\x15\xf4Y\x0f\x16gz\xe4\xee\xb8\xa0\x16A\xf9g\xc4\xef\x81\x92ac\x9d\x9fP\xb3aa(\x0e8\xce\x0e=\x0be8\x91\x81\xbf\x199y\x8f\xbczT@1\x02\xf9\xc3\x03<\xd0\x81\x1f\x83\xd9-<\xbb\x16\x0e"\xa3\x8d>\x03\xa32\xd1\x1b~\xeaY\x11\xc2\n\x8a]\xa5\xa0tCv\xd1\xcb\xd8\xbd\xc4\x94\x8e\xf9w\x9ao\xcds0\x17\x958N\xb7K\xd1\xabHf\xc4\xd2\xfa\x95(\x934\x05\xc7\xa3\xc4.\xa6\x98kg\x1a\xf6\xef\xcdR\xf29\x880\x01~\xa9\xf6+\xce\xbb\x14\xcf\x04}\x10\x91(\x1e\xb6\xed\x19\xf1>l\xbc\x80Q\xda\xbe^T\xcd\xde\x83}\x1e\xcbF\x98\x8dg#\x07\x85\xb4u\x14\xf8A\x07>\xca@z\x0cR\xf7\xdf\x19A\xa67\xa1@\x1aC4\xd7\x8b\xac\xb5\xb2\xd7\xb0\'\xd3O\xf0y,\x97\xc1] \tX<\x157K^\xc3\xf6\xf0Z\xcd\`\xab\xf2\xa2\xa8\x8e\xcb\xb3\x97h\xb4\xc13C&\xf1\n]\xd2\x88VW\x07\t(x\xe9\xd2\xd4\x18}o\xbc\x08\x92\x92k\xec!}\x91\xe2\x04\xe22\xa342\x14LMnB\xd3\xf5uk\xcb\xabuu>\xe4\x8d\xa0L\x9f\n\x10%\x0c\x9d#-\x82\xf1\x17\xf4/I\xa9\xf1\x1b\x98\xcc\x9e\xf8\xf8\xf3[H\xf31\xeb\xf1\x89\x1a\x1d\x1f[\xfdy\xcer\xe0>\xdf\x1fp\xd5\x86\x12\xd1=\xd2tep\x85<u^?c\x16\x89&3:\xfab\x11Ah@CQlC\x97\x94d\x9c\x19}M\xe4S\x93Nr\x1a,\xecdf\xa6\x95\xb3m\x06\xf6v{:Sa\xdc\x0e-o\xad\x9d\xe7\xc3\xf6a\x87\x81\x04\xd1\xdeF\xb7\x1f14Q\xbcR\x84)\x9a\xf9\x0b\xfe\xc4\x87U\xe4\x03C3!\x03\x126\xbb\x96y9\x13\xf9R.\xc4\x9ar\xd5\xff\xa2\xd5\xa52\x9f\\xb5\x9f\xc4l\xb9\xe0v:]\tCkRd\xb6\xe7\xc5\x17\x0c\xce\x94\x8c"\xca\xa6\xfe\x9b;\x11p\x92\xb3H\xc0\x90\xaf/t\xdb\x17\xa6\xa5K4\xc2S\x18\xce\xdf\xc7.\xb4A\xcb4V\xab\xed-\xc5:\xbc\x15N\x88\xfd\x9b\xb0Y\xaf2\xf9\xcb\xb20\xe7\x98\xb4\xf9\xff\xd3\x9d\r\xce$\x9c\xfd\x1f\xafw\xc4\xac\xf5l\x07\xfc\x95fo\xfc\x00\x94\xbf\x8c\x1b\x0bs\x91\xf1\xd1\x9e\x05\xabtZD^\xda\x10

Patient id, 907034:

@PATIENT_INFO03g\x02\x00\x00\xd8\x12F!0.6g\x0b\x8e? <---- S\x90\xfe\xe6A\xcc\xab8\x9c\x0c\x8a\x023\xae\x11\xd0\x19\xc1\x0eL\xdc\x976\xd8\x12\xe0iQ\xb1yC\x16gz\xe4\xee\xb8\xa0\x16A\xf9g\xc4\xef\x81\x92ac\x9d\x9fP\xb3a{$\x118\xdak5~h+\xd0\xdb\xbf\x199y\x8f\xbczT@1\x02\xf9\xc3\x03<\xd0\x81\x1f\x83\xd9-<\xbb\x16\x0e"\xa3\x8d=\x03\xa33\xd1\x15~\xeaY\x11\xc2\n\x8a]\xa5\xa0tCv\xd1\xcb\xd8\xbd\xc4\x94\x8e\xf9w\x9ao\xcds0\x17\x958N\xb7K\xd1\xabHf\xc4\xd2\xfa\x95(\x934\x05\xc7\xa3\xc4.\xa6\x98kg\x1a\xf6\xef\xcdR\xf29\x880\x01~\xa9\xf6+\xce\xbb\x14\xcf\x04}\x10\x91(\x1e\xb6\xed\x19\xf1>l\xbc\x80Q\xda\xbe^T\xcd\xde\x83}\x1e\xcbF\x98\x8dg#\x07\x85\xb4u\x14\xf8A\x07>\xca@z\x0cR\xf7\xdf\x19A\xa67\xa1@\x1aC4\xd7\x8b\xac\xb5\xb2\xd7\xb0\'\xd3O\xf0y,\x97\xc1] \tX<\x157K^\xc3\xf6\xf0Z\xcd\`\xab\xf2\xa2\xa8\x8e\xcb\xb3\x97h\xb4\xc13C&\xf1\n]\xd2\x88VW\x07\t(x\xe9\xd2\xd4\x18}o\xbc\x08\x92\x92k\xec!}\x91\xe2\x04\xe22\xa342\x14LMnB\xd3\xf5uk\xcb\xabuu>\xe4\x8d\xa0L\x9f\n\x10%\x0c\x9d#-\x82\xf1\x17\xf4/I\xa9\xf1\x1b\x98\xcc\x9e\xf8\xf8\xf3[H\xf31\xeb\xf1\x89\x1a\x1d\x1f[\xfdy\xcer\xe0>\xdf\x1fp\xd5\x86\x12\xd1=\xd2tep\x85<u^?c\x16\x89&3:\xfab\x11Ah@CQlC\x97\x94d\x9c\x19}M\xe4S\x93Nr\x1a,\xecdf\xa6\x95\xb3m\x06\xf6v{:Sa\xdc\x0e-o\xad\x9d\xe7\xc3\xf6a\x87\x81\x04\xd1\xdeF\xb7\x1f14Q\xbcR\x84)\x9a\xf9\x0b\xfe\xc4\x87U\xe4\x03C3!\x03\x126\xbb\x96y9\x13\xf9R.\xc4\x9ar\xd5\xff\xa2\xd5\xa52\x9f\\xb5\x9f\xc4l\xb9\xe0v:]\tCkRd\xb6\xe7\xc5\x17\x0c\xce\x94\x8c"\xca\xa6\xfe\x9b;\x11p\x92\xb3H\xc0\x90\xaf/t\xdb\x17\xa6\xa5K4\xc2S\x18\xce\xdf\xc7.\xb4A\xcb4V\xab\xed-\xc5:\xbc\x15N\x88\xfd\x9b\xb0Y\xaf2\xf9\xcb\xb20\xe7\x98\xb4\xf9\xff\xd3\x9d\r\xce$\x9c\xfd\x1f\xafw\xc4\xac\xf5l\x07\xfc\x95fo\xfc\x00\x94\xbf\x8c\x1b\x0bs\x91\xf1\xd1\x9e\x05\xabtZD^\xda\x10

I've making proofs but without results...

dakenblack commented 3 years ago

Hi sorry for the late response. That is similar to what I'm seeing as well. Do you think the Triton has an internal data store that contains all the patient information? Cause I know it stores it somehow (since the Topcon application is able to export the data).

I did also have a look at other chunks, but nothing seemed obvious to me. maybe you might have better luck.

antoniohupa commented 3 years ago

Hi Jabez

Since 2017, at least in my hospital, triton and maestro export .fda with "patient_info_03" chunk, I guess due to a updated version. however, I have found that when images are stored in a folder, a filelist with patient data is stored too. That filelist contains all patient data of that folder images. I wrote a code to read that filelist and from it you are able to export patient' id, gender, laterality, date and hour of capture, name and surname. Take a look in order if you have this file too. Otherwise, it seems impossible to extract patient's info from that structure of data. If you have it too, I can share with you the code.

Greetings

dakenblack commented 3 years ago

I see, thanks for that. I'll have a look. Is this folder created by the triton when storing it internally or is it created by the OCTDataExtractor.exe application?

antoniohupa commented 3 years ago

I'm not sure. What I have is automatically stored. At least in my hospital, all fda files are stored in folders. Every folder contains a number of fda files and a filelist with the patient information of that fda files. I really don't know what octdataextractor.exe does but I can ask.

marksgraham commented 3 years ago

Going to close for now

witedev commented 2 months ago

@antoniohupa Hello, is there anything new related about the chunk information of @PATIENT_INFO_03? I need the structure, and I am facing a lot of issues.

Thanks in advance