Open dbc7c659-22eb-4ad4-94ed-4f3208dba875 opened 11 years ago
Linux i386, Python 2.7.4. Multithread xml parsing via pyexpat cause segmentation fault
Expat is not thread-safe at the object level, a single Parser cannot be used from multiple threads. Pyexpat could add locks to Parser objects.
It could also track tids and raise an error if you attempt to use it from multiple threads.
But this would break working code which already uses locks correctly (or some kind of pool of cached parsers)
In my opinion it's fine to document Python's XML parser as not thread-safe and leave locking to the user. Any fancy locking or tracking is going to make it slower for users. Any it takes a lot of effort to implement the feature, too. lxml offers a faster XML parser with multi-threading support.
In my opinion it's not fine to let Python crash. The implementation could be similar to the one in bufferedio.c, it's quite lightweight.
I agree with Amaury, multi-threaded parsing should definitely not crash. Adding a lock should be quite easy. I wonder what would be the effect on performance, if there are lots of backs and forths between expat and Python.
I've reproduced the segfault on 3.11.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at = None created_at =
labels = ['expert-XML', '3.10', '3.9', 'type-crash', '3.11']
title = 'Mutlithread XML parsing cause segfault'
updated_at =
user = 'https://bugs.python.org/mrDoctorWho0'
```
bugs.python.org fields:
```python
activity =
actor = 'iritkatriel'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['XML']
creation =
creator = 'mrDoctorWho0..'
dependencies = []
files = ['30250']
hgrepos = []
issue_num = 17970
keywords = []
message_count = 8.0
messages = ['189131', '189159', '189161', '189162', '189225', '189232', '189533', '401257']
nosy_count = 6.0
nosy_names = ['amaury.forgeotdarc', 'pitrou', 'christian.heimes', 'alex', 'mrDoctorWho0..', 'iritkatriel']
pr_nums = []
priority = 'normal'
resolution = None
stage = 'needs patch'
status = 'open'
superseder = None
type = 'crash'
url = 'https://bugs.python.org/issue17970'
versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']
```