UnicodeEncodeError during portal_upgrade

laulaz commented 4 years ago

UnicodeEncodeError during portal_upgrade when you have accentuated characters in content type title.

In our case, the content type were created in French, through ZMI :

Traceback (most recent call last):
  File "/opt/plone/buildout-cache/eggs/Products.CMFPlone-5.2.1-py2.7.egg/Products/CMFPlone/MigrationTool.py", line 292, in upgrade
    step['step'].doStep(setup)
  File "/opt/plone/buildout-cache/eggs/Products.GenericSetup-2.0.1-py2.7.egg/Products/GenericSetup/upgrade.py", line 168, in doStep
    self.handler(tool)
  File "/opt/plone/buildout-cache/eggs/plone.app.upgrade-2.0.31-py2.7.egg/plone/app/upgrade/v50/alphas.py", line 67, in to50alpha1
    migrate_registry_settings(portal)
  File "/opt/plone/buildout-cache/eggs/plone.app.upgrade-2.0.31-py2.7.egg/plone/app/upgrade/v50/alphas.py", line 174, in migrate_registry_settings
    t for t in site_props.types_not_searched if t in portal_types)
  File "/opt/plone/buildout-cache/eggs/plone.registry-1.1.5-py2.7.egg/plone/registry/registry.py", line 51, in __setitem__
    self.records[name].value = value
  File "/opt/plone/buildout-cache/eggs/plone.registry-1.1.5-py2.7.egg/plone/registry/record.py", line 80, in _set_value
    field = field.bind(self)
  File "/opt/plone/buildout-cache/eggs/zope.schema-4.9.3-py2.7.egg/zope/schema/_field.py", line 806, in bind
    clone.value_type = clone.value_type.bind(context)
  File "/opt/plone/buildout-cache/eggs/plone.registry-1.1.5-py2.7.egg/plone/registry/field.py", line 292, in bind
    clone._vocabulary = vr.get(object, self.vocabularyName)
  File "/opt/plone/buildout-cache/eggs/Zope-4.1.3-py2.7.egg/Zope2/App/schema.py", line 32, in get
    return factory(context)
  File "/opt/plone/buildout-cache/eggs/plone.app.vocabularies-4.1.1-py2.7.egg/plone/app/vocabularies/types.py", line 186, in __call__
    for t in ttool.listContentTypes()]
  File "/opt/plone/buildout-cache/eggs/plone.dexterity-2.9.5-py2.7.egg/plone/dexterity/fti.py", line 213, in Title
    return self.title.decode('utf8')
  File "/opt/plone/staging/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 1: ordinal not in range(128)

https://github.com/plone/plone.dexterity/blob/fcb0253d2e5237e98ab1f159f77d23db2906b525/plone/dexterity/fti.py#L213

aadarsh-nagrath commented 1 year ago

when the method encounters a character that cannot be encoded in the target character set (in this case, ASCII), it raises a UnicodeEncodeError. So, using decode('utf8', 'ignore') modifies the behavior of the decode() method to ignore any non-encodable characters in the string. This means that the resulting Unicode string will not contain the problematic characters, but they will simply be omitted from the final output. This is potentially a good solution but its quick and effective for cases where the problematic characters are not critical to the application or system.

davisagli commented 1 year ago

@Coder-aadarsh Generally Plone uses utf-8 encoding, not ascii. We should make this work correctly with non-ASCII characters, not make it silently ignore them. There is a good chance this is already working fine in recent Plone versions with Python 3; someone needs to check.

plone / plone.dexterity

UnicodeEncodeError during portal_upgrade #127