python / cpython

The Python programming language
https://www.python.org
Other
62.28k stars 29.92k forks source link

Mutlithread XML parsing cause segfault #62170

Open dbc7c659-22eb-4ad4-94ed-4f3208dba875 opened 11 years ago

dbc7c659-22eb-4ad4-94ed-4f3208dba875 commented 11 years ago
BPO 17970
Nosy @amauryfa, @pitrou, @tiran, @alex, @iritkatriel
Files
  • pyexpat_crash_multithread.py: pyexpat test
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['expert-XML', '3.10', '3.9', 'type-crash', '3.11'] title = 'Mutlithread XML parsing cause segfault' updated_at = user = 'https://bugs.python.org/mrDoctorWho0' ``` bugs.python.org fields: ```python activity = actor = 'iritkatriel' assignee = 'none' closed = False closed_date = None closer = None components = ['XML'] creation = creator = 'mrDoctorWho0..' dependencies = [] files = ['30250'] hgrepos = [] issue_num = 17970 keywords = [] message_count = 8.0 messages = ['189131', '189159', '189161', '189162', '189225', '189232', '189533', '401257'] nosy_count = 6.0 nosy_names = ['amaury.forgeotdarc', 'pitrou', 'christian.heimes', 'alex', 'mrDoctorWho0..', 'iritkatriel'] pr_nums = [] priority = 'normal' resolution = None stage = 'needs patch' status = 'open' superseder = None type = 'crash' url = 'https://bugs.python.org/issue17970' versions = ['Python 3.9', 'Python 3.10', 'Python 3.11'] ```

    dbc7c659-22eb-4ad4-94ed-4f3208dba875 commented 11 years ago

    Linux i386, Python 2.7.4. Multithread xml parsing via pyexpat cause segmentation fault

    amauryfa commented 11 years ago

    Expat is not thread-safe at the object level, a single Parser cannot be used from multiple threads. Pyexpat could add locks to Parser objects.

    alex commented 11 years ago

    It could also track tids and raise an error if you attempt to use it from multiple threads.

    amauryfa commented 11 years ago

    But this would break working code which already uses locks correctly (or some kind of pool of cached parsers)

    tiran commented 11 years ago

    In my opinion it's fine to document Python's XML parser as not thread-safe and leave locking to the user. Any fancy locking or tracking is going to make it slower for users. Any it takes a lot of effort to implement the feature, too. lxml offers a faster XML parser with multi-threading support.

    amauryfa commented 11 years ago

    In my opinion it's not fine to let Python crash. The implementation could be similar to the one in bufferedio.c, it's quite lightweight.

    pitrou commented 11 years ago

    I agree with Amaury, multi-threaded parsing should definitely not crash. Adding a lock should be quite easy. I wonder what would be the effect on performance, if there are lots of backs and forths between expat and Python.

    iritkatriel commented 3 years ago

    I've reproduced the segfault on 3.11.