python / cpython

The Python programming language
https://www.python.org/
Other
59.74k stars 28.95k forks source link

Security review of pickle/marshal docs #35339

Closed tim-one closed 20 years ago

tim-one commented 22 years ago
BPO 471893
Nosy @tim-one, @warsaw, @akuchling
Files
  • pickle-docs.patch: Patch to pickle and marshal docs.
  • pickle-docs.patch: Updated version of patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = 'https://github.com/akuchling' closed_at = created_at = labels = ['extension-modules'] title = 'Security review of pickle/marshal docs' updated_at = user = 'https://github.com/tim-one' ``` bugs.python.org fields: ```python activity = actor = 'akuchling' assignee = 'akuchling' closed = True closed_date = None closer = None components = ['Extension Modules'] creation = creator = 'tim.peters' dependencies = [] files = ['155', '156'] hgrepos = [] issue_num = 471893 keywords = [] message_count = 31.0 messages = ['6959', '6960', '6961', '6962', '6963', '6964', '6965', '6966', '6967', '6968', '6969', '6970', '6971', '6972', '6973', '6974', '6975', '6976', '6977', '6978', '6979', '6980', '6981', '6982', '6983', '6984', '6985', '6986', '6987', '6988', '6989'] nosy_count = 7.0 nosy_names = ['tim.peters', 'nobody', 'jhylton', 'barry', 'akuchling', 'dcjim', 'phr'] pr_nums = [] priority = 'high' resolution = 'accepted' stage = None status = 'closed' superseder = None type = None url = 'https://bugs.python.org/issue471893' versions = [] ```

    tim-one commented 22 years ago

    Paul Rubin points out that the security implications of using marshal and/or pickle aren't clear from the docs. Assigning to Jeremy as he's more sensitive to such issues than I am; maybe Barry would like to get paranoid too \<wink>.

    A specific example: the pickle docs say that pickle doesn't support code objects, and "at least this avoids the possibility of smuggling Trojan horses into a program". However,

    1) The marshal docs don't mention this vulnerability at all.

    while

    2) The pickle docs don't spell out possible dangers due to things pickle does that marshal doesn't (like importing modules, and running class constructors).

    91e69f45-91d9-4b12-87db-a02908296c81 commented 22 years ago

    Logged In: YES user_id=72053

    Certainly anyone unserializing potentially malicious data with pickle, marshal, or anything else, should check the results before doing anything dangerous with them (like executing code). However, unpickle can potentially do damage before it even returns, by creating loading modules and creating initialized class instances. So pickle.loads should never be used on untrusted strings, except possibly in a rexec wrapper (as proposed by Tim). Another possibility (also by Tim) is to override the load_inst method of the Pickler class, though I don't think you can do that for cPickle.

    A sample exploit for unpickle can be found at \http://www.nightsong.com/phr/python/pickletest.py\. Unpickling the test string runs penguin.__init contrary to the doc's saying no initialization unless there's a __getinitargs method in the class def.

    The "exploding penguin" class is artificial, but applications are vulnerable if there's an unsafe constructor anywhere in any class of the application or in the python library (example: the NNTP constructor opens an IP connection to an arbitrary address, so a malicious imported string can send a message through your firewall when you import it).

    3772858d-27d8-44b0-a664-d68674859f36 commented 22 years ago

    Logged In: NO

    Irmen de Jong points out that the standard cookie module uses pickling for serial and smart cookies. The 2.1.1 cookie module docs explicitly say not to use those classes because of the security hole--that they're provided for backward compatibility only (but with what?! Any server that uses those classes on open ports needs to be changed right away).

    Irmen's library, http://pyro.sourceforge.net, also uses unpickle insecurely (he's aware of this now and is figuring out a fix).

    IMO this is really a code bug rather than a documentation bug, and should be fixed in the code rather than the docs. Documenting the bug rather than fixing it leaves a deficiency in the Python library: obvious uses of pickling, like Pyro and the cookie module, can't be implemented using cPickle and have to resort to a slower Python deserializer, or use marshal and have possible compatibility problems between versions (and not be able to serialize class instances).

    Paul

    03bde425-37ce-4291-88bd-d6cecc46a30e commented 22 years ago

    Logged In: YES user_id=31392

    What's the code bug? Your last message has a lot of gloom and doom \<wink>, but I'm not sure what specific problem you'd like to see fixed. Are you saying that something needs to be fixed in cPickle and not in pickle?

    3772858d-27d8-44b0-a664-d68674859f36 commented 22 years ago

    Logged In: NO

    IMO it's a code bug that you can't unpickle strings from untrusted sources. Pyro and the cookie module are examples of programs that got bitten by this bug. Whether it's really a bug is a matter of opinion--I had a big email exchange with Guido and Tim about it, and they felt it was enough to fix the pickle documentation.

    Pickle has the same problem as cPickle, but with pickle you can subclass the pickler and override the method that unpickles class objects, and work around the (IMO) bug. The workaround doesn't help cPickle since cPickle can't be subclassed. See bug bpo-467384 for some related discussion.

    Paul

    03bde425-37ce-4291-88bd-d6cecc46a30e commented 22 years ago

    Logged In: YES user_id=31392

    I don't think of the issue you describe as a bug in the code. You're suggesting a new feature for pickle. As far as I can tell, the original design requirements for pickle did not include the ability to securely load a pickle from an untrusted source.

    It may be a legitimate feature request, but it's too late to make it into Python 2.2. I suggest we look at the design issues for Python 2.3 and decide if it's a feature we want to support. I imagine a PEP may be necessary to lay out the issues and the solution. Do you want to have a hand in that PEP?

    I still don't understand what it means that Pyro and cookie were bit by a bug. It sounds like they were using pickle in ways that pickle was not intended to support. A careful analysis of how those two applications use pickle would be helpful for generating requirements.

    3772858d-27d8-44b0-a664-d68674859f36 commented 22 years ago

    Logged In: NO

    Well, Guido and Tim agree with you that it's not a pickle bug. I still feel it is one, because its docs currently make you think you can securely load untrusted pickles, and because it's a natural, non-obscure thing to want to do (see pyro and the cookie module), but whatever. If it's not a code bug then I feel it's a significant functionality shortcoming in the python library.

    Pyro uses pickle to serialize data for RPC calls over the internet. A malicious client could make a hostile pickle take over the server. The cookie module lets web applications store user session state in browser cookies. Its SerialCookie and SmartCookie classes let you put arbitrary Python objects into the user session, and serializes them when pickle. Again, a malicious client can make a hostile pickle, send it in a cookie header to the http server, and take over the server when the application unpickles the cookie.

    The current documentation for the pickle module makes it very clear to me that the doc writer thought it was safe to unpickle untrusted cookies. If pickle wasn't designed for that, then there was a communication failure between the designer and the doc writer.

    Yes, I'm willing to help with a PEP for fixing this situation.

    Paul

    warsaw commented 22 years ago

    Logged In: YES user_id=12800

    I'm going to agree with Paul that this is a problem needing fixing, however there are really several issues.

    1. Cookie module makes it too easy to code exploits. Cookie exports a class, also called Cookie, which is aliased to SmartCookie, so that a naive program will simply pass cookie data to Cookie.Cookie() and you're screwed. So, Cookie module's defaults should be for more security rather than less, and Cookie.Cookie should be aliased to SimpleCookie instead.

    2. There is no built-in safe mechanism for de-serializing data from untrusted sources. You can't use pickle without overloading a "magic" method. You can't use cPickle because you can't do the overloading trick. You can't use marshal because it isn't bulletproof against recursive datastructures. So how /do/ you do it?

    I think it just may be serious enough to deal with in Python 2.2, and I volunteer to address it (so I'll steal this bug report). Without looking at the code, or the level of effort necessary, I would make the following suggestions:

    1. Add to the public interface of pickle and cPickle, a flag that either disables the unpickling of instances altogether, or disables calling any code with unpickled data, e.g. constructors.

    2. Fix marshal to be bulletproof against recursive datastructures.

    3. Update the docs for both pickle/cPickle and marshal to explain how to safely write de-serializers of untrusted strings.

    3772858d-27d8-44b0-a664-d68674859f36 commented 22 years ago

    Logged In: NO

    See bug bpo-467384 for discussion about marshal. Besides the recursion issue, marshal's format is explicitly undocumented and subject to change--you can't rely on it to interoperate between two different Python versions, so it's no good as an RPC serializer. The format has kludges (e.g. the representation of long ints) that make it undesirable to freeze and document it and force future versions to be backward compatible.

    Adding a pickle.loads flag to prevent instance unpickling isn't perfect but is probably the best alternative on your list. Perhaps the flag can have a value that allows unpickling the instances by restoring the instance attributes rather than calling the initializer. That's not always the right way to unpickle an instance (that's why the unpickler no longer works that way) but it's good enough a lot of the time.

    There's another issue with pickle/cPickle which is that they unpickle quoted strings by evaling them. This is scary. While I don't see an immediate exploit, I also haven't examined the 1000's of lines of code I'd need to examine to convince myself that there's NOT an exploit. I think the unpickler should be changed to never call eval but just parse the string as it needs to.

    Guido seemed to think pickle might have other possible exploits. I don't know what he had in mind but before declaring it safe for untrusted data I think it needs to be gone over with a fine toothed comb.

    Paul

    03bde425-37ce-4291-88bd-d6cecc46a30e commented 22 years ago

    Logged In: YES user_id=31392

    I don't think we should be doing anything about marshal.
    Maybe we should name in pyclib or something \<0.9 wink>. It works fine for .pyc files, but I don't see a reason for it to do anymore than is necessary for that purpose.

    I think the notion of an unpickler that only handles builtin datatypes is the most attractive option you offer, but Paul has a good point about eval for strings. (It currently has some hacks that address specific exploits, but I doubt they are sufficient.) I also wonder how hard it is to handle builtin types and avoid subclasses of builtin types.

    If there are any changes to pickle, I think we need to be careful about how it is described. If we claim that an unpickler is safe for untrusted pickles, we've made a fairly strong claim. I still think such a design change requires a PEP that includes some requirements and use cases and a thorough analysis of potential exploits.

    3772858d-27d8-44b0-a664-d68674859f36 commented 22 years ago

    Logged In: NO

    I like marshal and think it's much cleaner than pickle. I'd like to see its format cleaned up and documented so it can be a portable transport format. Even the recursion issue isn't so compelling--there's only a danger on marshalling, not unmarshalling, and the application has control over what it marshals. So I think the idea of documenting marshal (bug bpo-467384) should be kept alive. However, it shouldn't be changed in 2.2.

    If the Cookie class is still really aliased by default to SmartCookie, that's a bad bug and should definitely be fixed ASAP. The Cookie docs clearly say not to use SmartCookie or SerialCookie. In fact I think SmartCookie and SerialCookie should be disabled altogether. Any applications using them on the open internet have a security but and should be fixed immediately.

    Here's a possible temporary approach to fixing SmartCookie and SerialCookie: append a cryptographic message authentication code (MAC) to each pickled cookie. That means the application has some secret string K as a configuration parameter (it should be 10 or more random 8-bit characters, or 20 or more random hex digits, etc). Then a smart cookie would be a pickle p, followed by SHA(p+K). The Cookie module would validate the MAC before calling unpickle, and raise an exception if the MAC wasn't valid.

    The security of this scheme rests on K being kept secret from attackers, which is generally not an easy thing to manage, but it's something to think about.

    Paul

    7aa9d209-ca02-4adc-8376-068152b3db61 commented 22 years ago

    Logged In: YES user_id=73023

    It sounds like there are some documentation bugs:

    tim-one commented 22 years ago

    Logged In: YES user_id=31435

    Why are people (Paul, Jeremy) concerned about eval'ing strings? cPickle and pickle both check that they're properly quoted, and this isn't sh or Perl: Python has no "dynamic" gimmicks buried in string literals. All eval'ing a string literal can do is produce a binary blob via interpeting simple escape sequences. They're like C strings this way -- maybe we'll run out of memory, but that's it.

    I would agree that Python should be refactored internally to supply a clean function for changing string literals into binary blobs, but that would be for clarity and efficiency, not security.

    3772858d-27d8-44b0-a664-d68674859f36 commented 22 years ago

    Logged In: NO

    It's possible that the eval is safe, but validating that takes a line by line code review of all the paths that evaling a string can go through. It's also brittle in that maybe someone will change the evaluator (say to support Perl-like interpolation) in a future Python version and not remember to change the unpickler. Something like that has already happened with the Cookie module. My guess is that it happened with the pickle module--the whole point of that quoted string check is that the original pickle implementer didn't trust the input. The stuff about unpickling class instances was added later (maybe by another person) without remembering the security issue.

    Using eval this way is like storing a vat of cyanide in your child's bedroom. Sure, maybe if you check the seals and locks on the vat carefully enough, you can convince yourself that your child won't be able to get to the cyanide. But wouldn't you feel safer just not having the vat there at all? That's basic safety-consciousness. Security consciousness works the same way. Try to keep dangerous ingredients and attackers as far away from each other as possible.

    Paul

    3772858d-27d8-44b0-a664-d68674859f36 commented 22 years ago

    Logged In: NO

    The find_global variable sounds encouraging and if it fixes this problem, that's great. I'd still feel better if the eval call goes away. Is removing it allowed under the 2.2 feature freeze? It doesn't affect anything visible, so no tests would have to change, etc.

    Paul

    warsaw commented 22 years ago

    Logged In: YES user_id=12800

    Assigning to Jeremy so he'll remember to forward me Jim's email. Jeremy can ping-pong it to Fred if he wants, and then reassign to me after forwarding the email.

    warsaw commented 22 years ago

    Logged In: YES user_id=12800

    I have rewritten the pickle documentation, and updated the marshal, cPickle, and copy_reg documentation. I believe all the documentation issues raised here have been fixed.

    Implementation issues will be pushed to Python 2.3, and the plan is to write a PEP to plan future work.

    3772858d-27d8-44b0-a664-d68674859f36 commented 22 years ago

    Logged In: NO

    Barry, can you also do something about the Cookie module, or at least about its documentation? If Cookie is aliased to SmartCookie, I think that needs mention. --Paul

    3772858d-27d8-44b0-a664-d68674859f36 commented 22 years ago

    Logged In: NO

    Are the new docs downloadable from somewhere? thanx --Paul

    3772858d-27d8-44b0-a664-d68674859f36 commented 22 years ago

    Logged In: NO

    Barry - the new docs just went up and they're a big improvement over the old. However the sentence "The issue here is usually not that a class's constructor will get called -- it won't by the unpickler -- but that the class's destructor (i.e. its __del__() method) might get called when the object is garbage collected." isn't correct.

    There's a flag in the pickle stream that tells the unpickler that the pickler found a __getinitargs method at pickling time. If the flag is set in the pickle, then the unpickler calls the class constructor, whether there's a __getinitargs method in the class or not.

    The sample exploit that I posted earlier on, \http://www.nightsong.com/phr/python/pickletest.py\, is an example of an artificially concocted pickle that makes a class constructor get called.

    warsaw commented 22 years ago

    Logged In: YES user_id=12800

    You're right of course. I meant to fix that and forgot. Will do so asap. Glad you like the rest of it, though! :)

    akuchling commented 21 years ago

    Logged In: YES user_id=11375

    Re-opening. With the changes to pickling in Python 2.3, the security material should be replaced with a warning to not unpickle untrusted data. Patch attached.

    warsaw commented 21 years ago

    Logged In: YES user_id=12800

    Karmically (no, not comically) reassigning to Tim who started this whole thing, and who happens to be dumping his pickles these days.

    tim-one commented 21 years ago

    Logged In: YES user_id=31435

    Andrew, didn't you go overboard in deleting text here? The __safe_for_unpickling__ check is gone in 2.3, but most of the rest of the text still applies. In particular, Cookie is still a problem for people inclined to worry about this, and overriding methods like find_global and load_global is still valuable stuff (the unpickler still never imports anything that doesn't come from an opcode triggering one of these methods).

    warsaw commented 21 years ago

    Logged In: YES user_id=12800

    I'll just mention that anybody using anything other than Cookie.SimpleCookie is insane, and even using the Cookie module at all to parse real-world cookies is mildly nuts because it's way too strict. I prefer m&m's in my cookies.

    akuchling commented 21 years ago

    Logged In: YES user_id=11375

    The Cookie classes that use pickle have DeprecationWarnings in 2.3, and should disappear in 2.4. If {find,load}_global are still going to be supported, though, then they should still be documented (though, given that you shouldn't be unpickling untrusted data, why would you ever bother overriding find_global?)

    tim-one commented 21 years ago

    Logged In: YES user_id=31435

    I think there are several reasons to override these methods.
    The one most relevant to this bug report is that, while Python has stopped pretending that pickles are secure by default, the choke points are still there, and motivated users can still expolit them.

    For example, search pickle.py for __import. The only occurrence of __import in the Unpickler class is in method find_class(), and that's by design. If a user overrides find_class(), the only imports the Unpickler *can* do are those the user explicitly performs in their own find_class() implementation. So if that's a notion of "security" a user is happy with, they can still have it. The docs trying to describe this are still valid. It's only the "by magic" safety checks that have gone away (and they were buggy anyway, so no loss).

    akuchling commented 21 years ago

    Logged In: YES user_id=11375

    OK; here's a reworked version that retitles the "Pickle security" section to "Subclassing Unpicklers", leaving the discussion of subclassing alone except for a few typo fixes.

    akuchling commented 20 years ago

    Logged In: YES user_id=11375

    Can I check this in?

    tim-one commented 20 years ago

    Logged In: YES user_id=31435

    Yes, please do! And thank you.

    akuchling commented 20 years ago

    Logged In: YES user_id=11375

    Checked in.