python / cpython

The Python programming language
https://www.python.org/
Other
60.9k stars 29.4k forks source link

Strict aliasing violations in Objects/unicodeobject.c #60196

Open mdickinson opened 11 years ago

mdickinson commented 11 years ago
BPO 15992
Nosy @jcea, @mdickinson, @tiran, @serhiy-storchaka, @phmc

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['interpreter-core', 'type-bug'] title = 'Strict aliasing violations in Objects/unicodeobject.c' updated_at = user = 'https://github.com/mdickinson' ``` bugs.python.org fields: ```python activity = actor = 'BreamoreBoy' assignee = 'none' closed = False closed_date = None closer = None components = ['Interpreter Core'] creation = creator = 'mark.dickinson' dependencies = [] files = [] hgrepos = [] issue_num = 15992 keywords = [] message_count = 1.0 messages = ['170841'] nosy_count = 5.0 nosy_names = ['jcea', 'mark.dickinson', 'christian.heimes', 'serhiy.storchaka', 'pconnell'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue15992' versions = ['Python 3.4', 'Python 3.5'] ```

mdickinson commented 11 years ago

[Broken out of the discussion in bpo-15144]

Some of the newly-optimized code in Objects/unicodeobject.c contains strict aliasing violations; under the C standards, this is undefined behaviour (C99 6.5p7).

An example occurs in ascii_decode:

unsigned long value = *(const unsigned long *) _p;

Here the pointer dereference violates the strict aliasing rule.

I think these portions of Objects/unicodeobject.c should be rewritten to avoid the undefined behaviour.

This is not a purely theoretical problem: compilers are known to make optimizations based on the assumption that strict aliasing is not violated. Early versions of David Gay's dtoa.c gave incorrect results as a result of strict aliasing violations, for example; see [1].

[2] gives a stackoverflow reference explaining strict aliasing.

[1] http://patrakov.blogspot.co.uk/2009/03/dont-use-old-dtoac.html [2] http://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule