Closed dakenblack closed 3 years ago
Hi Jabez,
We've had a bit of discussion about this on issue #13 - have you tried looking in @FDA_FILE_INFO for patient name etc? I think laterality is encoded in the first byte of, @CAPTURE_INFO_02 too.
@antoniohupa did you have any luck finding these fields in the .fda? If you could share some code that would be great - I could incorporate into the main package.
Mark
Hi Mark and Jabez
Yes, I could extract that information from FDA files (patient_id, eye, date of capture, etc). Patient's name is also easy to extract but I didn't dot it cause I need to work with anonymized data. I'm too busy these days and I must be focused in other projects right now but as soon as I can I will share with you my code.
A
Hi Mark and Antonio, Thanks for getting back to me. I will take a look at those chunks and see if I can find anything useful.
Looking forward to your code snippet as well.
Jabez
I had a look at the FDA_FILE_INFO chunk and this is what I see :
b'\x02\x00\x00\x00\xe0.\x00\x0010.1.5.48100\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
The number in the middle is the analysis software version, there is identification here.
I had a look at CAPTURE_INFO_02, I think you're right about the laterality. I need to confirm by looking at some more files though.
Hi there
In @PATIENT_INFO_02 you will find the patient id, name, surname, gender, birth year, month, day ... In @CAPTURE_INFO_02 you will find the eye in the first byte (x; 0 - right, 1-left) and the capture year, month, day .. I'm not completely sure the code for extract the eye is totally ok but reviewing the images with the ophtalmologists, it seems to match.
I've introduced new structures in the FDA class from Mark and a couple of functions to extract data of interest (see below). I'm pretty sure that this code will not work with fda files from other topcon models different from 3D OCT Maestro, which is the one I'm using. I'll try to get some fda files from other topcon models to adapt this code to them.
class FDA(object): """ Class for extracting data from Topcon's .fda file format.
Notes:
Mostly based on description of .fda file format here:
https://bitbucket.org/uocte/uocte/wiki/Topcon%20File%20Format
Attributes:
filepath (str): Path to .img file for reading.
header (obj:Struct): Defines structure of volume's header.
oct_header (obj:Struct): Defines structure of OCT header.
fundus_header (obj:Struct): Defines structure of fundus header.
chunk_dict (dict): Name of data chunks present in the file, and their start locations.
"""
def __init__(self, filepath):
self.filepath = Path(filepath)
if not self.filepath.exists():
raise FileNotFoundError(self.filepath)
self.header = Struct(
'FOCT' / PaddedString(4, 'ascii'),
'FDA' / PaddedString(3, 'ascii'),
'version_info_1' / Int32un,
'version_info_2' / Int32un
)
self.oct_header = Struct(
'type' / PaddedString(1, 'ascii'),
'unknown1' / Int32un,
'unknown2' / Int32un,
'width' / Int32un,
'height' / Int32un,
'number_slices' / Int32un,
'unknown3' / Int32un,
)
self.oct_header_2 = Struct(
'unknown' / PaddedString(1, 'ascii'),
'width' / Int32un,
'height' / Int32un,
'bits_per_pixel' / Int32un,
'number_slices' / Int32un,
'unknown' / PaddedString(1, 'ascii'),
'size' / Int32un,
)
self.fundus_header = Struct(
'width' / Int32un,
'height' / Int32un,
'bits_per_pixel' / Int32un,
'number_slices' / Int32un,
'unknown' / PaddedString(4, 'ascii'),
'size' / Int32un,
# 'img' / Int8un,
)
self.patient_info = Struct(
'Patient id' / PaddedString(32, 'u8'),
'Patient given name' / PaddedString(32, 'utf8'),
'Patient surname' / PaddedString(32, 'utf8'),
'Zeros' / PaddedString(8, 'u8'),
'Gender' / Int8un,
'Birth year' / Int16un,
'Birth month' / Int16un,
'Birth day' / Int16un,
'Birth year' / Int16un,
'Zeros2' / PaddedString(502, 'ascii')
)
self.capture_date = Struct(
'Eye' / Int8un,
'y' / Int16un,
'Zeros' / PaddedString(103, 'ascii'),
'Year' / Int16un,
'Month' / Int16un,
'Day' / Int16un,
'Hour' / Int16un,
'Minute' / Int16un,
'Second' / Int16un,
)
self.chunk_dict = self.get_list_of_file_chunks()
def get_list_of_file_chunks(self):
"""Find all data chunks present in the file.
Returns:
dict
"""
chunk_dict = {}
with open(self.filepath, 'rb') as f:
# skip header
raw = f.read(15)
header = self.header.parse(raw)
eof = False
while not eof:
chunk_name_size = np.fromstring(f.read(1), dtype=np.uint8)[0]
if chunk_name_size == 0:
eof = True
else:
chunk_name = f.read(chunk_name_size)
chunk_size = np.fromstring(f.read(4), dtype=np.uint32)[0]
chunk_location = f.tell()
f.seek(chunk_size, 1)
chunk_dict[chunk_name] = [chunk_location, chunk_size]
print('File {} contains the following chunks:'.format(self.filepath))
for key in chunk_dict.keys():
print(key)
return chunk_dict
def read_oct_volume(self):
""" Reads OCT data.
Returns:
obj:OCTVolumeWithMetaData
"""
if b'@IMG_JPEG' not in self.chunk_dict:
raise ValueError('Could not find OCT header @IMG_JPEG in chunk list')
with open(self.filepath, 'rb') as f:
chunk_location, chunk_size = self.chunk_dict[b'@IMG_JPEG']
f.seek(chunk_location) # Set the chunk’s current position.
raw = f.read(25)
oct_header = self.oct_header.parse(raw)
volume = np.zeros((oct_header.height, oct_header.width, oct_header.number_slices))
for i in range(oct_header.number_slices):
size = np.fromstring(f.read(4), dtype=np.int32)[0]
raw_slice= f.read(size)
slice = decode(raw_slice)
volume[:,:,i] = slice
oct_volume = OCTVolumeWithMetaData([volume[:, :, i] for i in range(volume.shape[2])])
return oct_volume
def read_patient_info(self):
""" Reads Patient info
Returns:
patient name, surname, gender
"""
if b'@PATIENT_INFO_02' not in self.chunk_dict:
raise ValueError('Could not find OCT header @PATIENT_INFO_02 in chunk list')
with open(filepath, 'rb') as f:
chunk_location, chunk_size = self.chunk_dict[b'@PATIENT_INFO_02']
f.seek(chunk_location) # Set the chunk’s current position.
raw = f.read(615)
patient_head = self.patient_info.parse(raw)
return patient_head
def read_capture_date(self):
""" Reads capture info
Returns:
eye and date of capture
"""
if b'@CAPTURE_INFO_02' not in self.chunk_dict:
raise ValueError('Could not find OCT header @CAPTURE_INFO_02 in chunk list')
with open(filepath, 'rb') as f:
chunk_location, chunk_size = self.chunk_dict[b'@CAPTURE_INFO_02']
f.seek(chunk_location) # Set the chunk’s current position.
raw = f.read(118)
#num = int.from_bytes(raw, 'little')
#out_hex = ['{:02X}'.format(b) for b in raw]
date = self.capture_date.parse(raw)
return date
Executing fda.read_patient_info() or fda.read_capture_date() you'll get what you need.
Hi Antonio, Thanks for that but unfortunately my FDA files do not have a "PATIENT_INFO_02" it has "PATIENT_INFO_03" and as far as I can tell the data in my file does not have the same format as yours. I can get the capture date and I think I can get the eye laterality but not the patient ID, which is pretty important.
Jabez
Hi Jabez
The same happens to me when I try to parse that information from fda files from Topcon Triton instead of 3D Maestro. "PATIENT_INFO_03" seems to be very messy. I have some fda files from a Triton identified with patient id, name, etc. Having these information I'll try to find them in the bytes but unfortunately I cannot right now. In the meanwhile, could explore some more in the data?
Yea, the files I've got are from a Triton as well. I found that most of the data in that chunk is exactly the same as the data found in other files. I've got to verify this for sure but I'm pretty sure the FDA files (I've compared) hold data for different patients so I shouldn't expect it to be so similar.
Thanks for your help, any assistance would be greatly appreciated. I'll continue to look through other files as well.
That's right, almost all code in that chunk is the same between patients. The only differences are found in the first 4-5 bytes:
Patient id, 399047:
@PATIENT_INFO03g\x02\x00\x00\xd2\x1bH"0\x196g\x0b\x8e <----
S\x90\xfe\xe6A\xcc\xab8\x9c\x0c\x8a\x023\xae\x11\xd0\x19\xc1\x0eL\xdc\x908\xd8\x1c\xe4I\x15\xf4Y\x0f\x16gz\xe4\xee\xb8\xa0\x16A\xf9g\xc4\xef\x81\x92ac\x9d\x9fP\xb3aa(\x0e8\xce\x0e=\x0be8\x91\x81\xbf\x199y\x8f\xbczT@1\x02\xf9\xc3\x03<\xd0\x81\x1f\x83\xd9-<\xbb\x16\x0e"\xa3\x8d>\x03\xa32\xd1\x1b~\xeaY\x11\xc2\n\x8a]\xa5\xa0tCv\xd1\xcb\xd8\xbd\xc4\x94\x8e\xf9w\x9ao\xcds0\x17\x958N\xb7K\xd1\xabHf\xc4\xd2\xfa\x95(\x934\x05\xc7\xa3\xc4.\xa6\x98kg\x1a\xf6\xef\xcdR\xf29\x880\x01~\xa9\xf6+\xce\xbb
\x14\xcf\x04}\x10\x91(\x1e\xb6\xed\x19\xf1>l\xbc\x80Q\xda\xbe^T\xcd\xde\x83}\x1e\xcbF\x98\x8dg#\x07\x85\xb4u\x14\xf8A\x07>\xca@z\x0cR\xf7\xdf\x19A\xa67\xa1@\x1aC4\xd7\x8b\xac\xb5\xb2\xd7\xb0\'\xd3O\xf0y,\x97\xc1] \tX<\x157K^\xc3\xf6\xf0Z\xcd\`\xab\xf2\xa2\xa8\x8e\xcb\xb3\x97h\xb4\xc13C&\xf1\n]\xd2\x88VW\x07\t(x\xe9\xd2\xd4\x18}o\xbc\x08\x92\x92k\xec!}\x91\xe2\x04\xe22\xa342\x14LMnB\xd3\xf5uk\xcb\xabuu>\xe4\x8d\xa0L\x9f\n\x10%\x0c\x9d#-\x82\xf1\x17\xf4/I\xa9\xf1\x1b\x98\xcc\x9e\xf8\xf8\xf3[H\xf31\xeb\xf1\x89\x1a\x1d\x1f[\xfdy\xcer\xe0>\xdf\x1fp\xd5\x86\x12\xd1=\xd2tep\x85<u^?c\x16\x89&3:\xfab\x11Ah@CQlC\x97\x94d\x9c\x19}M\xe4S\x93Nr\x1a,\xecdf\xa6\x95\xb3m\x06\xf6v{:Sa\xdc\x0e-o\xad\x9d\xe7\xc3\xf6a\x87\x81\x04\xd1\xdeF\xb7\x1f14Q\xbcR\x84)\x9a\xf9\x0b\xfe\xc4\x87U\xe4\x03C3!\x03\x126\xbb\x96y9\x13\xf9R.\xc4\x9ar\xd5\xff\xa2\xd5\xa52\x9f\\xb5\x9f\xc4l\xb9\xe0v:]\tCkRd\xb6\xe7\xc5\x17\x0c\xce\x94\x8c"\xca\xa6\xfe\x9b;\x11p\x92\xb3H\xc0\x90\xaf/t\xdb\x17\xa6\xa5K4\xc2S\x18\xce\xdf\xc7.\xb4A\xcb4V\xab\xed-\xc5:\xbc\x15N\x88\xfd\x9b\xb0Y\xaf2\xf9\xcb\xb20\xe7\x98\xb4\xf9\xff\xd3\x9d\r\xce$\x9c\xfd\x1f\xafw\xc4\xac\xf5l\x07\xfc\x95fo\xfc\x00\x94\xbf\x8c\x1b\x0bs\x91\xf1\xd1\x9e\x05\xabtZD^\xda\x10
Patient id, 907034:
@PATIENT_INFO03g\x02\x00\x00\xd8\x12F!0.6g\x0b\x8e? <---- S\x90\xfe\xe6A\xcc\xab8\x9c\x0c\x8a\x023\xae\x11\xd0\x19\xc1\x0eL\xdc\x976\xd8\x12\xe0iQ\xb1yC\x16gz\xe4\xee\xb8\xa0\x16A\xf9g\xc4\xef\x81\x92ac\x9d\x9fP\xb3a{$\x118\xdak5~h+\xd0\xdb\xbf\x199y\x8f\xbczT@1\x02\xf9\xc3\x03<\xd0\x81\x1f\x83\xd9-<\xbb\x16\x0e"\xa3\x8d=\x03\xa33\xd1\x15~\xeaY\x11\xc2\n\x8a]\xa5\xa0tCv\xd1\xcb\xd8\xbd\xc4\x94\x8e\xf9w\x9ao\xcds0\x17\x958N\xb7K\xd1\xabHf\xc4\xd2\xfa\x95(\x934\x05\xc7\xa3\xc4.\xa6\x98kg\x1a\xf6\xef\xcdR\xf29\x880\x01~\xa9\xf6+\xce\xbb
\x14\xcf\x04}\x10\x91(\x1e\xb6\xed\x19\xf1>l\xbc\x80Q\xda\xbe^T\xcd\xde\x83}\x1e\xcbF\x98\x8dg#\x07\x85\xb4u\x14\xf8A\x07>\xca@z\x0cR\xf7\xdf\x19A\xa67\xa1@\x1aC4\xd7\x8b\xac\xb5\xb2\xd7\xb0\'\xd3O\xf0y,\x97\xc1] \tX<\x157K^\xc3\xf6\xf0Z\xcd\`\xab\xf2\xa2\xa8\x8e\xcb\xb3\x97h\xb4\xc13C&\xf1\n]\xd2\x88VW\x07\t(x\xe9\xd2\xd4\x18}o\xbc\x08\x92\x92k\xec!}\x91\xe2\x04\xe22\xa342\x14LMnB\xd3\xf5uk\xcb\xabuu>\xe4\x8d\xa0L\x9f\n\x10%\x0c\x9d#-\x82\xf1\x17\xf4/I\xa9\xf1\x1b\x98\xcc\x9e\xf8\xf8\xf3[H\xf31\xeb\xf1\x89\x1a\x1d\x1f[\xfdy\xcer\xe0>\xdf\x1fp\xd5\x86\x12\xd1=\xd2tep\x85<u^?c\x16\x89&3:\xfab\x11Ah@CQlC\x97\x94d\x9c\x19}M\xe4S\x93Nr\x1a,\xecdf\xa6\x95\xb3m\x06\xf6v{:Sa\xdc\x0e-o\xad\x9d\xe7\xc3\xf6a\x87\x81\x04\xd1\xdeF\xb7\x1f14Q\xbcR\x84)\x9a\xf9\x0b\xfe\xc4\x87U\xe4\x03C3!\x03\x126\xbb\x96y9\x13\xf9R.\xc4\x9ar\xd5\xff\xa2\xd5\xa52\x9f\\xb5\x9f\xc4l\xb9\xe0v:]\tCkRd\xb6\xe7\xc5\x17\x0c\xce\x94\x8c"\xca\xa6\xfe\x9b;\x11p\x92\xb3H\xc0\x90\xaf/t\xdb\x17\xa6\xa5K4\xc2S\x18\xce\xdf\xc7.\xb4A\xcb4V\xab\xed-\xc5:\xbc\x15N\x88\xfd\x9b\xb0Y\xaf2\xf9\xcb\xb20\xe7\x98\xb4\xf9\xff\xd3\x9d\r\xce$\x9c\xfd\x1f\xafw\xc4\xac\xf5l\x07\xfc\x95fo\xfc\x00\x94\xbf\x8c\x1b\x0bs\x91\xf1\xd1\x9e\x05\xabtZD^\xda\x10
I've making proofs but without results...
Hi sorry for the late response. That is similar to what I'm seeing as well. Do you think the Triton has an internal data store that contains all the patient information? Cause I know it stores it somehow (since the Topcon application is able to export the data).
I did also have a look at other chunks, but nothing seemed obvious to me. maybe you might have better luck.
Hi Jabez
Since 2017, at least in my hospital, triton and maestro export .fda with "patient_info_03" chunk, I guess due to a updated version. however, I have found that when images are stored in a folder, a filelist with patient data is stored too. That filelist contains all patient data of that folder images. I wrote a code to read that filelist and from it you are able to export patient' id, gender, laterality, date and hour of capture, name and surname. Take a look in order if you have this file too. Otherwise, it seems impossible to extract patient's info from that structure of data. If you have it too, I can share with you the code.
Greetings
I see, thanks for that. I'll have a look. Is this folder created by the triton when storing it internally or is it created by the OCTDataExtractor.exe application?
I'm not sure. What I have is automatically stored. At least in my hospital, all fda files are stored in folders. Every folder contains a number of fda files and a filelist with the patient information of that fda files. I really don't know what octdataextractor.exe does but I can ask.
Going to close for now
@antoniohupa Hello, is there anything new related about the chunk information of @PATIENT_INFO_03? I need the structure, and I am facing a lot of issues.
Thanks in advance
Hi Mark,
Great work on this project. Let me preface this by saying I don't have an issue with the OCT-Converter project, I am currently working on a project to extract some patient data from FDA files and I'm having a hard time finding the structure of the data.
I am currently trying to extract some patient data (name, eye side etc) from FDA files. The data seems to be in a chunk with the tag PATIENT_INFO_03, the uocte page (https://bitbucket.org/uocte/uocte/wiki/Topcon%20File%20Format) doesn't have any documentation on this chunk (only PATIENT_INFO_02).
I have some FDA files and some exported data (using Topcon's OCTDataCollector.exe) and doing a brute force search doesn't yield any matches either. I feel the data is encrypted but I can't be too sure.
My reason for posting here is that I'm hoping you might have come across this and know something about it.
Jabez