pyxem / kikuchipy

Toolbox for analysis of electron backscatter diffraction (EBSD) patterns
https://kikuchipy.org
GNU General Public License v3.0
79 stars 30 forks source link

Cannot read Oxford's binary .ebsp files with version 4 #591

Closed hakonanes closed 1 year ago

hakonanes commented 1 year ago

@mikesmic, have you successfully read an Oxford instruments .ebsp binary file with version 4 into Python before? @barcusmehling, I see you wrote the Oxford .ebsp reader in OpenXY (source code), are you familiar with a version 4 of this format?

I recently tried to read an Oxford Instruments .ebsp binary file with version number 4 (the first eight bytes give -4). It was acquired summer 2022. Our current reader cannot read this file. It fails with the following error:

>>> import kikuchipy as kp
>>> s = kp.load("filename.ebsp")
Traceback (most recent call last):
  File "/home/hakon/miniconda3/envs/kp-dev/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3442, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-3-89a36507d0b8>", line 8, in <module>
    obf = OxfordBinaryFileReader(f)
  File "/home/hakon/kode/kikuchipy/kikuchipy/io/plugins/oxford_binary.py", line 155, in __init__
    nrows, ncols, step_size = self.get_navigation_shape_and_step_size()
  File "/home/hakon/kode/kikuchipy/kikuchipy/io/plugins/oxford_binary.py", line 283, in get_navigation_shape_and_step_size
    last_y = last_footer["beam_y"][0][0]
IndexError: index 0 is out of bounds for axis 0 with size 0

Our reader assumes the following file structure:

  1. First 8 bytes (int64): File version
  2. 8 x n bytes (int64), where n is the number of patterns: Pattern byte positions
  3. Remaining bytes are per pattern, with the following header and footer
    • Header: i. 4 bytes (int32): Is the pattern compressed? ii. 4 bytes (int32): Number of detector rows, sy. iii. 4 bytes (int32): Number of detector columns, sx. iv. 4 bytes (int32): Number of detector pixels. If it is 2 * sy * sx, the patterns are stored as uint16, otherwise uint8.
    • sy * sx or 2 * sy * sx bytes: Pattern.
    • Footer: i. 1 byte (bool): Is the sample x position of the pattern present? Only present if version > 1. ii. 8 bytes (float64): Sample x position if present. Only if version > 0. iii. 1 byte (bool): Is the sample y position of the pattern present? Only present if version > 1. iv. 8 bytes (float64): Sample y position if present. Only if version > 0.

This particular file had a navigation (map) shape of (n rows, n columns) = (251, 301) and a signal (detector) shape of (s rows, s columns) = (88, 156), of uint8 data type. According to the above file structure, this should give 1,040,337,278 bytes (8 + 8 75,551 + 16 75,551 + 75,551 13,728 + 18 75,551). But, the file size is 1,040,339,401 bytes, with 2,123 bytes left unaccounted for. I have no idea what these bytes are or where they are located.

The only thing I know is that the pattern header is read incorrectly. Assuming the above file structure, the first pattern header in the file results in False, 22528, 39936, 3514368, which makes no sense given the assumed header.

mikesmic commented 1 year ago

The format you describe is consistent with what I have in some old notes, except I have that there is an 18 byte footer at the end of the file. I'm not sure what version this is for though.

2,000 pats
16,024 bytes at beginning
34 bytes between pats
18 bytes left at end

header consisting of 8 bytes then 8 bytes per image
this is the offset in the file of each image

then each image has 16 bytes then image data then 18 bytes
16 byte image header consits of 4 4-byte ints 
(0, 128, 156, 19968) probably (map start index?, pat y-dim, pat x-dim, map end index)
this is the same for all patterns in the file

It would be good if you could get another file acquired with identical conditions but with a different number of patterns. Then you can figure out how much of the file is data per pattern and how much is file header/footer. You could also try scanning through the file loading in a block of data the size of the patterns and plotting until you see the first and second pattern image to find the gap between pattern data block.

hakonanes commented 1 year ago

Thank you for your good suggestions, @mikesmic. By the way, I'm confident that my file header and pattern header I described above are correct and should apply to your file as well.

It would be good if you could get another file

I cannot as the file I have was not acquired in our lab.

You could also try scanning through the file loading in a block of data the size of the patterns and plotting until you see the first and second pattern image to find the gap between pattern data block.

This is a very good idea and I will try to do this.

hakonanes commented 1 year ago

except I have that there is an 18 byte footer at the end of the file

This is what I have as well in the .ebsp files I have (in addition to the present unreadable one). I'm confident that you then have a file with version > 1.

mikesmic commented 1 year ago

Let me know how you get with loading a single image. If you can't figure it out with a single file then I can probably get you some more. I assume the file came from the latest version of Aztec.

CiosG commented 1 year ago

If someone ticked "store compressed patterns" in Aztec before acquisition, the reader might not work. In case you have a whole Aztec project along with .ebsp it can be exported to .h5 format from Aztec ver. 6.0.

W dniu 2023-01-16 13:35, Michael Atkinson napisał(a):

The format you describe is consistent with what I have in some old notes, except I have that there is an 18 byte footer at the end of the file. I'm not sure what version this is for though.

2,000 pats 16,024 bytes at beginning 34 bytes between pats 18 bytes left at end

header consisting of 8 bytes then 8 bytes per image this is the offset in the file of each image

then each image has 16 bytes then image data then 18 bytes 16 byte image header consits of 4 4-byte ints (0, 128, 156, 19968) probably (map start index?, pat y-dim, pat x-dim, map end index) this is the same for all patterns in the file

It would be good if you could get another file acquired with identical conditions but with a different number of patterns. Then you can figure out how much of the file is data per pattern and how much is file header/footer. You could also try scanning through the file loading in a block of data the size of the patterns and plotting until you see the first and second pattern image to find the gap between pattern data block.

-- Reply to this email directly, view it on GitHub [1], or unsubscribe [2]. You are receiving this because you are subscribed to this thread.Message ID: @.***>

Links:

[1] https://github.com/pyxem/kikuchipy/issues/591#issuecomment-1383991106 [2] https://github.com/notifications/unsubscribe-auth/AI3HGBRRZMMZTKLJIYQAGW3WSU6B7ANCNFSM6AAAAAAT2FG7TY

-- Cios Grzegorz Academic Centre for Materials and Nanotechnology (ACMiN) AGH University of Science and Technology, Krakow, Poland 30 Mickiewicza, 30-059 Krakow bldg. D-16 (Kawiory 30, 30-055 Krakow), room 2.23 tel: +48 12 617 52 78

hakonanes commented 1 year ago

If someone ticked "store compressed patterns" in Aztec before acquisition, the reader might not work.

Yes, this is automatically checked for in the reader, and if such a file is passed to load() it should raise this error:

https://github.com/pyxem/kikuchipy/blob/3106c75be3f7f0c0be94796082b8577731ab1781/kikuchipy/io/plugins/oxford_binary.py#L129-L132

In this case however the number of patterns align very nicely with the number of bytes in the file. This leads me to believe they are not compressed.

In case you have a whole Aztec project along with .ebsp it can be exported to .h5 format from Aztec ver. 6.0.

I don't know if all files are there, but there are at least a bunch of binary .dat files and two .ebsp files (one of which is the file in question in this issue) in the directory I have from my colleague.

CiosG commented 1 year ago

In case there is .oip and/or .oipx I can try to open it and at least export .h5oina or look for the setting that could have caused the problem. One would need to somehow share the files with me.

W dniu 2023-01-16 18:22, Håkon Wiik Ånes napisał(a):

If someone ticked "store compressed patterns" in Aztec before acquisition, the reader might not work.

Yes, this is automatically checked for in the reader, and if such a file is passed to load() it should raise this error:

https://github.com/pyxem/kikuchipy/blob/3106c75be3f7f0c0be94796082b8577731ab1781/kikuchipy/io/plugins/oxford_binary.py#L129-L132

In this case however the number of patterns align very nicely with the number of bytes in the file. This leads me to believe they are not compressed.

In case you have a whole Aztec project along with .ebsp it can be exported to .h5 format from Aztec ver. 6.0.

I don't know if all files are there, but there are at least a bunch of binary .dat files and two .ebsp files (one of which is the file in question in this issue) in the directory I have from my colleague.

-- Reply to this email directly, view it on GitHub [1], or unsubscribe [2]. You are receiving this because you commented.Message ID: @.***>

Links:

[1] https://github.com/pyxem/kikuchipy/issues/591#issuecomment-1384356540 [2] https://github.com/notifications/unsubscribe-auth/AI3HGBS2IZXIHEFDWPASWXDWSV7VTANCNFSM6AAAAAAT2FG7TY

-- Cios Grzegorz Academic Centre for Materials and Nanotechnology (ACMiN) AGH University of Science and Technology, Krakow, Poland 30 Mickiewicza, 30-059 Krakow bldg. D-16 (Kawiory 30, 30-055 Krakow), room 2.23 tel: +48 12 617 52 78

hakonanes commented 1 year ago

In case there is .oip and/or .oipx

There is not, unfortunately. I'll ask my colleague when I see them next. Thank you for wanting to help!

hakonanes commented 1 year ago

Thanks to @drowenhorst-nrl for figuring out that files of version 4 can be read like before by skipping an extra byte (uint8) after the file version. This will be fixed in the imminent v0.8 minor release.

@mikesmic (EMsoft) and @@barcusmehling (OpenXY), you might find this information useful.