Closed raphaelmatto closed 8 years ago
I noticed a related bug: Japanese chars don't work when input as a Group in Settings. I suspect the different omit statuses may also not work. Unicode support should probably be added for those.
The 'official' strategy in toolkit apps is to treat everything as utf-8 and avoid any unicode objects. All toolkit engines are setting themselves up so that CStrings are assumed to be in UTF-8 locale, meaning that the display of all characters will be correct even when they are stored in utf-8 encoded byte streams.
The approach that I would recommend would be this:
\w
. See this example in coreHope this helps!
The approach does not seem right, and it looks like a proper fix would be to ensure that the default encoding is utf8 and not ascii. I gave it a go and called this function in the app init, which seems to fix all unicode problems:
def ensure_utf8_encoding():
# Make sure we don't use default "ascii" encoding
import sys
reload(sys) # Reload to get back access to setdefaultencoding
sys.setdefaultencoding("UTF8")
However, this shouldn't be done at the app level, as we are setting a global and this might have some side effects (bearing in mind I'm not a unicode expert at all):
Following Manne's advices, I'm checking if there is another way to fix it, which would be to ensure we read EDL files as utf8, so basically all our input data is utf8.
Sounds like a good engineering meeting topic !
Apart from that, I noticed that import cut input widgets (e.g. shot names in summary view) do not accept unicode characters. This should be fixed as well.
I tried the recommended approach, converting inputs to utf-8. However, this is not enough as strings emitted in signals are converted to unicode, so the conversion needs to happen in receiving slots, involving adding conversions a various random places, just in case. A global approach allowing to not worry about it anymore would be preferable imho.
I created a new branch / pull request with what I think is a better approach: https://github.com/shotgunsoftware/tk-multi-importcut/pull/54
The idea is to follow the official TK paradigm and always convert unicode strings to utf-8 encoded strings before using them. This gives us a nice pattern that we can use in all apps I think: from now on, all slots expecting a string should assume it is a unicode string in all our apps.
This code solves all issues in the EDL for ticket 37717, but I suspect other utf-8 issues will come up. Anyone looking at this: is there a better way to deal with encodings/decodings like this? I haven't done a lot of this, so feel very unsure about my solution even though it works here. I'm curious about big picture strategy for dealing w/utf-8. How often should we do it? Only when necessary, or everywhere? Should we encode/decode right after pulling info from the Shotgun api, or later / only where necessary in the code?