Open pinkhamr-fb opened 1 year ago
@pitrou @gpshead Would you like to take a look at this issue?
While that is "surprising" behavior, the implementation of that ShareableList does not appear to make good guarantees.
str
and bytes
are impacted. Trailing \x00 characters are valid in both.Workaround: unconditionally append a single non-0 character or byte to any shared data when putting items in and unconditionally ignore the final character (truncation or memoryview) on the consuming side.
There are other constraints worth documenting as well. those "int"s are a maximum of 8 bytes struct packed without specifying if they are signed or not. https://docs.python.org/3/library/multiprocessing.shared_memory.html#multiprocessing.shared_memory.ShareableList needs improvement.
FWIW, the workaround you proposed is what I ended up doing in my code to get around this.
I'm willing to work on a fix for this. Is implementing the workaround mentioned into ShareableList
considered an acceptable solution, or are we looking for something more involved?
To me, it seems like the issue is that we're padding all str
and bytes
to an 8 byte alignment, but we're forgetting to save the actual data length. Adding a sentinel value to the end of the str
or bytes
(like the workaround does) seems like the most reasonable method to fixing it without changing the underlying encoding to add the actual data length.
Bug report
tl;dr; See stack overflow post
When copying a
bytes
object to a shareable list, the trailing zeros are stripped causing data loss. This doesn't appear in the documentation as far as I can tell, and seems to be unexpected behavior related to the implementation.Example code:
Output:
Offending portion of CPython code:
Linked PRs