Closed guruevi closed 2 years ago
Hi! Would you be able to provide the DICOM series that caused the problems (in anonymized form)? This would help us a lot in analyzing the problem (you can send the files to tobias.block@nyumc.org). I tested it using different DICOMs with UTF-8 encoded characters in the DICOM tags, but couldn't reproduce the problem. One explanation could be that it looks like you are using an older mercure version based on Python 3.6 (the current version is using Python 3.8). A few things have changed in the UTF-8 handling with Python 3.7. Therefore, the problem might not occur anymore in the recent version. Many thanks!!
So, we're using python's open
to open the file. By default open
assumes the file was written with the system locale (both in python 3.6 and 3.8).
Since we are running on a recent Ubuntu, the system locale is UTF-8 and it hasn't caused problems- just writing a simple utf-8 json file with non-ascii characters and reading it in with json.load
works fine. So I think perhaps your system locale isn't utf-8 and that's why you've noticed this problem and we haven't.
The json file is, I think, always written as UTF-8, regardless of the system locale, so I think if we explicitly set the expected encoding of the file to UTF-8 (json_file = open(..., encoding='utf-8')
the issue will go away.
This is indeed an issue with an older version of Python/Docker. Once I updated to the latest, issue disappeared.
Describe the bug When a DICOM tag contains a UTF-8 character (eg. (10^-6 mm²/s), router will not process file and loop infinitely
The "squared" character is a UTF-8 multi-byte character (0xC2 0xB2) but json.load decodes it as ASCII since that seems to be the file format that is written by getdcmtags.
To Reproduce Steps to reproduce the behavior:
Expected behavior Process normally
Screenshots INFO route_series: Processing series
INFO route_series: DICOM files found: 64
ERROR route_series: Invalid tag information of series
Traceback (most recent call last):
File "/home/mercure/mercure/routing/route_series.py", line 87, in route_series
tagsList: Dict[str, str] = json.load(json_file)
File "/home/mercure/mercure-env/lib/python3.6/json/init.py", line 296, in load
return loads(fp.read(),
File "/home/mercure/mercure-env/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 362: ordinal not in range(128)