shotgunsoftware / tk-multi-importcut

Import Cut app
Other
0 stars 0 forks source link

For #37717, added support for Japanese utf-8 chars. #53

Closed raphaelmatto closed 8 years ago

raphaelmatto commented 8 years ago

This code solves all issues in the EDL for ticket 37717, but I suspect other utf-8 issues will come up. Anyone looking at this: is there a better way to deal with encodings/decodings like this? I haven't done a lot of this, so feel very unsure about my solution even though it works here. I'm curious about big picture strategy for dealing w/utf-8. How often should we do it? Only when necessary, or everywhere? Should we encode/decode right after pulling info from the Shotgun api, or later / only where necessary in the code?

raphaelmatto commented 8 years ago

I noticed a related bug: Japanese chars don't work when input as a Group in Settings. I suspect the different omit statuses may also not work. Unicode support should probably be added for those.

manneohrstrom commented 8 years ago

The 'official' strategy in toolkit apps is to treat everything as utf-8 and avoid any unicode objects. All toolkit engines are setting themselves up so that CStrings are assumed to be in UTF-8 locale, meaning that the display of all characters will be correct even when they are stored in utf-8 encoded byte streams.

The approach that I would recommend would be this:

Hope this helps!

sgsteph commented 8 years ago

The approach does not seem right, and it looks like a proper fix would be to ensure that the default encoding is utf8 and not ascii. I gave it a go and called this function in the app init, which seems to fix all unicode problems:

def ensure_utf8_encoding():
    # Make sure we don't use default "ascii" encoding
    import sys
    reload(sys)  # Reload to get back access to setdefaultencoding
    sys.setdefaultencoding("UTF8")

However, this shouldn't be done at the app level, as we are setting a global and this might have some side effects (bearing in mind I'm not a unicode expert at all):

Following Manne's advices, I'm checking if there is another way to fix it, which would be to ensure we read EDL files as utf8, so basically all our input data is utf8.

Sounds like a good engineering meeting topic !

Apart from that, I noticed that import cut input widgets (e.g. shot names in summary view) do not accept unicode characters. This should be fixed as well.

sgsteph commented 8 years ago

I tried the recommended approach, converting inputs to utf-8. However, this is not enough as strings emitted in signals are converted to unicode, so the conversion needs to happen in receiving slots, involving adding conversions a various random places, just in case. A global approach allowing to not worry about it anymore would be preferable imho.

sgsteph commented 8 years ago

I created a new branch / pull request with what I think is a better approach: https://github.com/shotgunsoftware/tk-multi-importcut/pull/54

The idea is to follow the official TK paradigm and always convert unicode strings to utf-8 encoded strings before using them. This gives us a nice pattern that we can use in all apps I think: from now on, all slots expecting a string should assume it is a unicode string in all our apps.