usnistgov / oar-pdr

The NIST Open Access to Research (OAR) Public Data Repository (PDR) system software
12 stars 10 forks source link

Provide full support for unicode in filenames #340

Closed RayPlante closed 4 months ago

RayPlante commented 4 months ago

This PR fixes support for unicode characters in submitted dataset filenames under the python2.7. This includes pulling in a fix to oar-metadata's jq/urldecode.jq script which establishes the names of dataset files. Supporting unicode on Python 2.7 is tricky since unicode is not the default string type. A key part for the fix is updating the pdrtest docker image to set python's default encoding to 'UTF-8'; this will also have to be applied at the oar-docker level. A number of tests (but not all) were updated to ensure support for unicode is working.

Also fixed in this PR:

RayPlante commented 4 months ago

Tested via hot fix on oardata to support mds2-2775.