Closed a7ef6b1c-b793-44a9-b579-512dc081db84 closed 4 years ago
As of https://github.com/python/cpython/pull/7076, it looks like at least the C implementation of pickle.load expects the file argument to implement readinto:
This is a change in behavior relative to previous versions of Python and I don't see it mentioned in PEP-574 or in the pull request so I'm not sure why it was changed.
This change breaks some PyTorch tests (see https://github.com/pytorch/pytorch/issues/32289) and, at least one PyTorch user, although I don't have full details there.
I can try to fix this on the PyTorch side but I first want to check that this was an intentional change on the Python side of things.
Right, I think at some point during the PEP-574 implementation review, we decided that the input object had to be a proper file implementation (for example, deriving from io.BufferedIOBase).
I think it should be easy for PyTorch to fix this internally (by deriving from the proper io base class), but we can also add a fix to pickle.
In this case the tests are explicitly testing that a file-like object that does not implement readinto works with torch.load (which is using pickles under the hood). See https://github.com/pytorch/pytorch/blob/master/test/test_serialization.py#L416-L429, in particular the usage of FilelikeMock with has_readinto set to False.
This goes back to https://github.com/pytorch/pytorch/pull/5466, however I'm not sure offhand why it was decided to explicitly test file-like objects that don't implement readinto. I don't know how many consumers there are for this API in pytorch, but there is at least one user who has reported an error traceback caused by this change. I'm hoping to hear more to see whether there are any libraries that depend on pytorch and are implicitly depending on this functionality.
Googling the error message that Python3.8 emits in this situation also leads me to this bug report in a different library: https://github.com/web2py/py4web/issues/77
So I *think* I've pieced together what caused the user crash that originated in the flair library. It turns out that pickle.load, via torch.load, is getting passed an mmap.mmap.
https://github.com/flairNLP/flair/blob/1d44abf35f1122d1e146c9a219b2cb5f98b41b40/flair/nn.py#L85-L86
Since mmap doesn't implement readinto, pickle.load objects as of Python 3.8. This is new behavior in Python3.8, it used to be possible to load a memory-mapped pickle file.
Short repro script:
import pickle
import mmap
data = "some data"
with open('my_data.pkl', 'wb') as f:
pickle.dump(data, f)
with open("my_data.pkl", "r+b") as f_in:
mm = mmap.mmap(f_in.fileno(), 0)
print(pickle.load(mm))
On Python3.8, this script prints an error, on Python3.7 it prints "some data".
Well, in the mmap case, this is trivially fixed by writing:
with open("my_data.pkl", "r+b") as f_in:
mm = mmap.mmap(f_in.fileno(), 0)
print(pickle.loads(memoryview(mm)))
It will also be faster this way, since the pickle will be read directly from memory instead of going through read() calls.
This makes me think that perhaps this issue isn't as bad: it forces people to abandon bad idioms :-)
New changeset 9f37872e307734666a7169f7be6e3370d3068282 by Antoine Pitrou in branch 'master': bpo-39681: Fix C pickle regression with minimal file-like objects (bpo-18592) https://github.com/python/cpython/commit/9f37872e307734666a7169f7be6e3370d3068282
New changeset b19f7ecfa3adc6ba1544225317b9473649815b38 by Miss Islington (bot) in branch '3.8': bpo-39681: Fix C pickle regression with minimal file-like objects (GH-18592) (bpo-18630) https://github.com/python/cpython/commit/b19f7ecfa3adc6ba1544225317b9473649815b38
The regression is fixed in git master and in the 3.8 branch. However, I would still advice against the idiom of handling a mmap object as a regular file when using pickle.
Thank you for the fix! Yes I'm planning to file an issue with flair about this and patch this use case in PyTorch itself.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = 'https://github.com/pitrou' closed_at =
created_at =
labels = ['3.8', 'type-bug', 'library', '3.9']
title = 'pickle.load expects an object that implements readinto'
updated_at =
user = 'https://bugs.python.org/NathanGoldbaum'
```
bugs.python.org fields:
```python
activity =
actor = 'Nathan.Goldbaum'
assignee = 'pitrou'
closed = True
closed_date =
closer = 'pitrou'
components = ['Library (Lib)']
creation =
creator = 'Nathan.Goldbaum'
dependencies = []
files = []
hgrepos = []
issue_num = 39681
keywords = ['patch']
message_count = 9.0
messages = ['362242', '362259', '362281', '362291', '362400', '362546', '362548', '362585', '362596']
nosy_count = 4.0
nosy_names = ['pitrou', 'lukasz.langa', 'Nathan.Goldbaum', 'miss-islington']
pr_nums = ['18592', '18630']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue39681'
versions = ['Python 3.8', 'Python 3.9']
```