mercure-imaging / mercure

mercure DICOM Orchestrator
https://mercure-imaging.org
MIT License
65 stars 31 forks source link

UTF-8 issues with tags file #33

Closed guruevi closed 2 years ago

guruevi commented 2 years ago

Describe the bug When a DICOM tag contains a UTF-8 character (eg. (10^-6 mm²/s), router will not process file and loop infinitely

The "squared" character is a UTF-8 multi-byte character (0xC2 0xB2) but json.load decodes it as ASCII since that seems to be the file format that is written by getdcmtags.

To Reproduce Steps to reproduce the behavior:

  1. Send file with non-ASCII characters in DICOM tags

Expected behavior Process normally

Screenshots INFO route_series: Processing series INFO route_series: DICOM files found: 64 ERROR route_series: Invalid tag information of series Traceback (most recent call last): File "/home/mercure/mercure/routing/route_series.py", line 87, in route_series tagsList: Dict[str, str] = json.load(json_file) File "/home/mercure/mercure-env/lib/python3.6/json/init.py", line 296, in load return loads(fp.read(), File "/home/mercure/mercure-env/lib/python3.6/encodings/ascii.py", line 26, in decode return codecs.ascii_decode(input, self.errors)[0] UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 362: ordinal not in range(128)

tblock79 commented 2 years ago

Hi! Would you be able to provide the DICOM series that caused the problems (in anonymized form)? This would help us a lot in analyzing the problem (you can send the files to tobias.block@nyumc.org). I tested it using different DICOMs with UTF-8 encoded characters in the DICOM tags, but couldn't reproduce the problem. One explanation could be that it looks like you are using an older mercure version based on Python 3.6 (the current version is using Python 3.8). A few things have changed in the UTF-8 handling with Python 3.7. Therefore, the problem might not occur anymore in the recent version. Many thanks!!

RoyWiggins commented 2 years ago

So, we're using python's open to open the file. By default open assumes the file was written with the system locale (both in python 3.6 and 3.8).

Since we are running on a recent Ubuntu, the system locale is UTF-8 and it hasn't caused problems- just writing a simple utf-8 json file with non-ascii characters and reading it in with json.load works fine. So I think perhaps your system locale isn't utf-8 and that's why you've noticed this problem and we haven't.

The json file is, I think, always written as UTF-8, regardless of the system locale, so I think if we explicitly set the expected encoding of the file to UTF-8 (json_file = open(..., encoding='utf-8') the issue will go away.

guruevi commented 2 years ago

This is indeed an issue with an older version of Python/Docker. Once I updated to the latest, issue disappeared.