ome / omero-py

Python project containing Ice remoting code for OMERO
https://www.openmicroscopy.org/omero
GNU General Public License v2.0
22 stars 32 forks source link

Add unit test covering Unicode #133

Closed sbesson closed 4 years ago

sbesson commented 4 years ago

See also https://github.com/ome/openmicroscopy/pull/6189

b45e95c exposes a Python 3.6 regression when adding a StringColumn containing Unicode. The same scenario passes without issue on Python 2

sbesson commented 4 years ago

The numpy.dtype note about using strings in Python 3 is probably relevant to the root of this problem. Unfortunately, local attempts to migrateStringColumn.dtypes() from S to U have been unsuccessful.

Earlier demo on IDR upgraded to an experimental Python 3 environment seems to suggest that the reading of StringColumn created on Python 2 with Unicode characters is unaffected:

Screen Shot 2019-12-02 at 16 19 22

I expect I will not be in capacity to provide a fix for this regression for the OMERO 5.6.0. There is a question of whether this should be marked as a blocker for GA, it is certainly one for the upgrade of IDR to Python 3 as it breaks the annotation workflows if CSV files contains Unicode characters.

As immediate next steps, proposing to:

Alternate thoughts or suggestions welcome /cc @joshmoore @jburel @manics

joshmoore commented 4 years ago

Something I haven't really considered yet: would a UnicodeColumn be of use?

manics commented 4 years ago

What would be the difference between a StringColumn with unicode and a UnicodeColumn?

joshmoore commented 4 years ago

It would be a location that could different read/write logic if that would help.

sbesson commented 4 years ago

Superseded by #143