Closed CristhianPerdomo closed 1 year ago
Glad you like it!
Seems like the problem is coming from
self._read(
index["metadata_offset"], index["metadata_offset"] + index["metadata_length"]
)
Can you print it and see what it is, like so:
print(str( self._read(
index["metadata_offset"], index["metadata_offset"] + index["metadata_length"]
)))
Hi @henrypinkard and @CristhianPerdomo ,
I opened an issue about the same error. see issue 432.
My error was exactly as you describe: when python tries '_read_channel_names()' and then it tries to read_metadata, and then calls json to decode some variable 's'.
I opened this up in an IDE with a debugger and had it stop just before the last call to decode. Using the debugger I checked the variable s and found that the byte that causes the error is a plus minus symbol. However, my IDE had no problem decoding the string even with utf-8 so I was very confused by this.
I took inspiration from a similar read_metadata() function in a different file and was able to fix my issue, as described in Issue 432 by decoding using .decode("iso-8859-1") before it gets passed to json.loads(). I do not understand why this worked but it did.
Henry responded very quickly, with some explanation and a suggested test to see if my encodings work as expected on my PC and some suggested reading to investigate this problem.
The code @henrypinkard sent to test the encoding of plus minus worked perfectly fine in jupyter cell.
I regrettably, got nowhere with my digging of how encodings work and why this strange thing was happening.
Thanks, Svilen
Thanks for your answer @henrypinkard. The following was part of the print:
"COM4-Description":"Serial port driver (boost:asio)","Cobolt-Vendor":"H\xdcBNER Photonics".
And here is one of the problems, since HÜBNER Photonics is the company from which we bought our cobolt laser. The Ü is the character that is causing the error.
Later in the text, I have something similar:
"Internal","pco_camera-Signal 4 (Status Expos) Timing":"Show time of \'First Line\'",
And:
"Cobolt-Description":"Cobolt Controller by Karl Bellv\xe9 with contribution from Alexis Maizel".
Then, it is possible that changing the type of decoding in some of the functions gives a solution to the problems; however, I don't know exactly where to make the change. Or, I don't know what the optimal solution might be.
Hi @henrypinkard and @CristhianPerdomo ,
I opened an issue about the same error. see issue 432.
My error was exactly as you describe: when python tries '_read_channel_names()' and then it tries to read_metadata, and then calls json to decode some variable 's'.
I opened this up in an IDE with a debugger and had it stop just before the last call to decode. Using the debugger I checked the variable s and found that the byte that causes the error is a plus minus symbol. However, my IDE had no problem decoding the string even with utf-8 so I was very confused by this.
I took inspiration from a similar read_metadata() function in a different file and was able to fix my issue, as described in Issue 432 by decoding using .decode("iso-8859-1") before it gets passed to json.loads(). I do not understand why this worked but it did.
Henry responded very quickly, with some explanation and a suggested test to see if my encodings work as expected on my PC and some suggested reading to investigate this problem.
The code @henrypinkard sent to test the encoding of plus minus worked perfectly fine in jupyter cell.
I regrettably, got nowhere with my digging of how encodings work and why this strange thing was happening.
Thanks, Svilen
Hi @svikolev, thanks! I saw your comment a little bit late, but I will try your trick ;)
@CristhianPerdomo After getting the latest nightly build of micro and updating pycro. I got the exact same error as you with my colibri plus minus. I fixed it by decoding in "iso-8859-1" before json loads as described before. Reiterating that I don't know why this works. I took the idea from bridge.py, acquisitions.py, data.py which all do the same kind of thing when calling json.loads()
I applied the line: .decode("iso-8859-1") to the end of line 91 in nd_tiff_current.py so now the read metadata function is:
def read_metadata(self, index):
return json.loads(
self._read(
index["metadata_offset"], index["metadata_offset"] + index["metadata_length"]
).decode("iso-8859-1")
)
Thanks for all the testing @svikolev and @CristhianPerdomo !
I think I finally figured it out. Encodings (ISO-8859-1
, UTF-8
, UTF-16
, etc.) are maps from numbers to characters used to convert byte values to text.
Metadata is saved to disk as a string (of JSON), and when saving some encoding has to be applied to the string in order to convert its characters to bytes. The NDTiff specificiation says that all metadata should use the UTF8
encoding, but I noticed in the java code that writes NDTiffs that the encoding used was just the system default, not explicitly UTF8
. I'm guessing it defaulted to UTF8
most of the time, except in your cases.
I made UTF8
explicit so that future datasets wont have this problem
https://github.com/micro-manager/NDTiffStorage/pull/66
And this change will be available in the new nightly builds
However, if this is right, it means the that the data you've already collected has its metadata encoded with an encoding other than UTF8
. @svikolev's solution of using the decode("iso-8859-1")
or maybe decode("utf-16")
as well should work for this, if its present on all of your datasets.
I just updated the format version to 3.1
with this fix. And with #68 it is now possible to call dataset.minor_version
and dataset.major_version
to query this version
This could probably be fixed more generally in the current read_metadata
function by adding a try
except
block and switch to alternative encodings if UTF8
fails. I opened an issue for it https://github.com/micro-manager/NDTiffStorage/issues/67. It would be a great addition if either of you is interested in making it.
@CristhianPerdomo After getting the latest nightly build of micro and updating pycro. I got the exact same error as you with my colibri plus minus. I fixed it by decoding in "iso-8859-1" before json loads as described before. Reiterating that I don't know why this works. I took the idea from bridge.py, acquisitions.py, data.py which all do the same kind of thing when calling json.loads()
I applied the line: .decode("iso-8859-1") to the end of line 91 in nd_tiff_current.py so now the read metadata function is:
def read_metadata(self, index): return json.loads( self._read( index["metadata_offset"], index["metadata_offset"] + index["metadata_length"] ).decode("iso-8859-1") )
@svikolev, I tried this, and it worked perfectly well. Thanks for your suggestion!
Thanks for all the testing @svikolev and @CristhianPerdomo !
I think I finally figured it out. Encodings (
ISO-8859-1
,UTF-8
,UTF-16
, etc.) are maps from numbers to characters used to convert byte values to text.Metadata is saved to disk as a string (of JSON), and when saving some encoding has to be applied to the string in order to convert its characters to bytes. The NDTiff specificiation says that all metadata should use the
UTF8
encoding, but I noticed in the java code that writes NDTiffs that the encoding used was just the system default, not explicitlyUTF8
. I'm guessing it defaulted toUTF8
most of the time, except in your cases.I made
UTF8
explicit so that future datasets wont have this problemmicro-manager/NDTiffStorage#66
And this change will be available in the new nightly builds
However, if this is right, it means the that the data you've already collected has its metadata encoded with an encoding other than
UTF8
. @svikolev's solution of using thedecode("iso-8859-1")
or maybedecode("utf-16")
as well should work for this, if its present on all of your datasets.I just updated the format version to
3.1
with this fix. And with #68 it is now possible to calldataset.minor_version
anddataset.major_version
to query this versionThis could probably be fixed more generally in the current
read_metadata
function by adding atry
except
block and switch to alternative encodings ifUTF8
fails. I opened an issue for it micro-manager/NDTiffStorage#67. It would be a great addition if either of you is interested in making it.
@henrypinkard, thanks for your explanation and help. Everything you have said makes a lot of sense and gives all answers to the bug. Also, thanks for solving this issue; it is nice that the newer versions will have UTF8
encoding explicit to avoid these kinds of events. As I said above, the solution of @svikolev went well, but, as you mentioned, it should be a good idea to implement a more general solution in the read_metadata
function that could have a broader spectrum; then, your invitation to us to collaborate on the function is joyously received!
Best
Great! Happy to help if you need guidance
Bug report
Bug summary
First of all, thanks a lot for everything you have done with pycromanager, it has been a powerful tool that has automated some processes in the lab!
I have been running pycromanager on windows 7 machines for a while because of its compatibility with some of our apparatus. Everything works well in Win7; However, we have recently updated one of our pcs to Windows 10, and when I try to run pycromanager there, with a simple function of acquisition, it gives me the following error:
Expected outcome
Acquisition of stacks
Actual outcome
PS C:\Users\SPIM3\Documents\cod> & C:/Users/SPIM3/AppData/Local/Programs/Python/Python311/python.exe c:/Users/SPIM3/Documents/cod/test.py utf-8 Traceback (most recent call last): File "c:\Users\SPIM3\Documents\cod\test.py", line 43, in
with Acquisition(directory=save_dir, name=r"acqStack", show_display=False) as acq:
File "C:\Users\SPIM3\AppData\Local\Programs\Python\Python311\Lib\site-packages\pycromanager\acquisitions.py", line 440, in exit
self.await_completion()
File "C:\Users\SPIM3\AppData\Local\Programs\Python\Python311\Lib\site-packages\pycromanager\acquisitions.py", line 380, in await_completion
self._check_for_exceptions()
File "C:\Users\SPIM3\AppData\Local\Programs\Python\Python311\Lib\site-packages\pycromanager\acquisitions.py", line 452, in _check_for_exceptions
raise self._exception
File "C:\Users\SPIM3\AppData\Local\Programs\Python\Python311\Lib\site-packages\pycromanager\acquisitions.py", line 194, in _storage_monitor_fn
axes = dataset._add_index_entry(index_entry)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\SPIM3\AppData\Local\Programs\Python\Python311\Lib\site-packages\ndtiff\nd_tiff_current.py", line 401, in _add_index_entry
self._read_channel_names()
File "C:\Users\SPIM3\AppData\Local\Programs\Python\Python311\Lib\site-packages\ndtiff\nd_tiff_current.py", line 422, in _read_channel_names
channel_name = self.read_metadata(**axes)["Channel"]
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\SPIM3\AppData\Local\Programs\Python\Python311\Lib\site-packages\ndtiff\nd_tiff_current.py", line 370, in read_metadata
return self._do_read_metadata(axes)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\SPIM3\AppData\Local\Programs\Python\Python311\Lib\site-packages\ndtiff\nd_tiff_current.py", line 573, in _do_read_metadata
return reader.read_metadata(index)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\SPIM3\AppData\Local\Programs\Python\Python311\Lib\site-packages\ndtiff\nd_tiff_current.py", line 88, in read_metadata
return json.loads(
^^^^^^^^^^^
File "C:\Users\SPIM3\AppData\Local\Programs\Python\Python311\Lib\json__init__.py", line 341, in loads
s = s.decode(detect_encoding(s), 'surrogatepass')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xdc in position 588: invalid continuation byte
I have tried to change the coding in VS Code, as well as using the r before de strings in order to treat them as raw strings, and other little tricks to encode/decode strings, but nothing works. I can not understand which could be wrong :/
Version Info
Thanks!