mitogen-hq / mitogen

Distributed self-replicating programs in Python
https://mitogen.networkgenomics.com/
BSD 3-Clause "New" or "Revised" License
2.34k stars 199 forks source link

core: Investigate pickle and alternatives #126

Open moreati opened 6 years ago

moreati commented 6 years ago

Mitogen currently uses cPickle (with some filtering) to serialize data passed between parents and children. Pickle is widely regarded as risky - deserializing an arbitrary pickle from an untrusted source can trivially result in code execution. Mitogen has some protection is in-place already

The pickler will instantiate only built-in types and one of 3 constructor functions, to support unpickling CallError, _DEAD, and Context. — Mitogen: How it Works

This is a medium to long term undertaking. To assess the feasibility of replacing Pickle I plan to

@dw I can document my progress in this issue, as changes to the docs (in a branch/PR), or perhaps as a Wiki page. Do you have a preference?

"Talk without pickle, and it won't exec the worm" — Paul Atreides, Weapon of Choice

dw commented 6 years ago

This is fantastic work! Docs or issue would be my preference :)

FWIW the bit that really swings me on cPickle needing to die is a python-dev mailing list thread from ~2002 era. I can't remember how I found that thread. I started by git blaming https://docs.python.org/2/library/pickle.html to find whoever wrote that warning at the top. If you follow the whole way back through the renames (and conversion from TeX to rst!!) you'll find a date and author. If you then visit the python-dev list archives around those months, there is a guru-level thread basically warning against even what we're doing today.

dw commented 6 years ago

I'm going to need your snail mail address soon to start delivering kegs of beer, if you keep up the Dune puns I'm going to be bankrupted haha

dw commented 6 years ago

Btw I saw Chopsticks has a super compact codec, I really liked the look of it, except that it invokes isinstance() constantly rather than only on a slow path.. ideally that dispatch could be constant time, with something like:

try:
    return converters[type(obj)](obj)
except KeyError:
    # slow path

But perhaps he discovered that repeat isinstance() is faster than generating a function call for every object in the graph

moreati commented 6 years ago

ideally that dispatch could be constant time, with something like try: ... except KeyError:

funny you should mention that ...

dw commented 6 years ago

Ah yes, one final thing to keep in mind: it'd be /so/ nice if any new solution gave us more latitude to make the library more user-friendly. I really wish we could just pickle the hell out of every user type and watch the world burn, but security concerns of course prevents that. Meanwhile the solution today is so annoyingly restrictive. I wish there was some middle ground that preserved safety and convenience. Zero clue what it would look like, though, but I'm certain the serializer sits front and center of any such improvement

moreati commented 6 years ago

All accounts suggest it cannot be used securely, however few of those accounts appear to be expert

I'm starting to get a sense of what you meant by this. Although

  1. Instead of "appear to be expert" I would say something like "cite current attacks"
  2. "I cannot hack this" != "this is safe to use"
moreati commented 6 years ago

I squeezed some bytes and CPU cycles out of Chopsticks pencode(). It's now closer to constant time dispatch https://github.com/lordmauve/chopsticks/pull/61

dw commented 6 years ago

Champion! How was the before/after benchmark when you introduced all those function calls? I may check it out and try running it later, I'm a sucker for little microbenchmarks like this

On 17 March 2018 at 03:36, Alex Willmer notifications@github.com wrote:

I squeezed some bytes and CPU cycles out of Chopsticks pencode(). It's now closer to constant time dispatch lordmauve/chopsticks#61 https://github.com/lordmauve/chopsticks/pull/61

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dw/mitogen/issues/126#issuecomment-373854818, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAJCzqlBdSVr6JbrExLmQkNEISu00IZks5tfDPmgaJpZM4SlX_K .

moreati commented 6 years ago

Each commit after benchmark_pencode.py has the time in the commit message, just shy of 50% reduction overall.

On 16 March 2018 at 21:55, dw notifications@github.com wrote:

Champion! How was the before/after benchmark when you introduced all those function calls? I may check it out and try running it later, I'm a sucker for little microbenchmarks like this

On 17 March 2018 at 03:36, Alex Willmer notifications@github.com wrote:

I squeezed some bytes and CPU cycles out of Chopsticks pencode(). It's now closer to constant time dispatch lordmauve/chopsticks#61 https://github.com/lordmauve/chopsticks/pull/61

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dw/mitogen/issues/126#issuecomment-373854818, or mute the thread https://github.com/notifications/unsubscribe-auth/ AAAJCzqlBdSVr6JbrExLmQkNEISu00IZks5tfDPmgaJpZM4SlX_K .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/dw/mitogen/issues/126#issuecomment-373855552, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKpclHbJBZKYo3LiZ9DtVAYpc38mo8Yks5tfDTRgaJpZM4SlX_K .

-- Alex Willmer alex@moreati.org.uk

dw commented 6 years ago

funny you should mention that ...

Noting the date of the first commit, now this makes sense :P~

moreati commented 6 years ago

If you follow the whole way back through the renames (and conversion from TeX to rst!!) you'll find a date and author. If you then visit the python-dev list archives around those months, there is a guru-level thread basically warning against even what we're doing today.

2 commits appear to fit this bill

I've not found the thread yet.

PS: better pickle punage

dw commented 6 years ago

I might have just been thinking about the associated bug -- https://bugs.python.org/issue471893 , it's pretty guru level already. But I could've sworn I saw mailman archive headers flicking past. Still digging, will comment here if I find it.

On 17 March 2018 at 04:15, Alex Willmer notifications@github.com wrote:

If you follow the whole way back through the renames (and conversion from TeX to rst!!) you'll find a date and author. If you then visit the python-dev list archives around those months, there is a guru-level thread basically warning against even what we're doing today.

2 commits appear to fit this bill

I've not found the thread yet.

PS: better pickle punage https://mail.python.org/pipermail/python-dev/2003-February/033320.html

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dw/mitogen/issues/126#issuecomment-373862269, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAJC5Q0JMxMtljHlUcv7eaDXrxksf_2ks5tfD0EgaJpZM4SlX_K .

dw commented 6 years ago

https://www.python.org/dev/peps/pep-0307/#security-issues

On 17 March 2018 at 04:25, David Wilson dw@botanicus.net wrote:

I might have just been thinking about the associated bug -- https://bugs.python.org/issue471893 , it's pretty guru level already. But I could've sworn I saw mailman archive headers flicking past. Still digging, will comment here if I find it.

On 17 March 2018 at 04:15, Alex Willmer notifications@github.com wrote:

If you follow the whole way back through the renames (and conversion from TeX to rst!!) you'll find a date and author. If you then visit the python-dev list archives around those months, there is a guru-level thread basically warning against even what we're doing today.

2 commits appear to fit this bill

I've not found the thread yet.

PS: better pickle punage https://mail.python.org/pipermail/python-dev/2003-February/033320.html

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dw/mitogen/issues/126#issuecomment-373862269, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAJC5Q0JMxMtljHlUcv7eaDXrxksf_2ks5tfD0EgaJpZM4SlX_K .

dw commented 6 years ago

FWIW it's not the presence of GvR on that PEP that gives me the heeby-jeebies, it's Tim Peters. You find him 2 years earlier on that bug almost implying that find_global (like we use today) is good enough. And that guy is the definition of smart, if something changed his mind, I'm not going to question why

On 17 March 2018 at 04:31, David Wilson dw@botanicus.net wrote:

https://www.python.org/dev/peps/pep-0307/#security-issues

On 17 March 2018 at 04:25, David Wilson dw@botanicus.net wrote:

I might have just been thinking about the associated bug -- https://bugs.python.org/issue471893 , it's pretty guru level already. But I could've sworn I saw mailman archive headers flicking past. Still digging, will comment here if I find it.

On 17 March 2018 at 04:15, Alex Willmer notifications@github.com wrote:

If you follow the whole way back through the renames (and conversion from TeX to rst!!) you'll find a date and author. If you then visit the python-dev list archives around those months, there is a guru-level thread basically warning against even what we're doing today.

2 commits appear to fit this bill

I've not found the thread yet.

PS: better pickle punage https://mail.python.org/pipermail/python-dev/2003-February/033320.html

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dw/mitogen/issues/126#issuecomment-373862269, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAJC5Q0JMxMtljHlUcv7eaDXrxksf_2ks5tfD0EgaJpZM4SlX_K .

dw commented 6 years ago

I also investigating auditing cPickle (and only cPickle) directly. It's 5000 lines of C that interacts heavily with the runtime (e.g. copyreg). It could be done, but it's serious work

On 17 March 2018 at 04:33, David Wilson dw@botanicus.net wrote:

FWIW it's not the presence of GvR on that PEP that gives me the heeby-jeebies, it's Tim Peters. You find him 2 years earlier on that bug almost implying that find_global (like we use today) is good enough. And that guy is the definition of smart, if something changed his mind, I'm not going to question why

On 17 March 2018 at 04:31, David Wilson dw@botanicus.net wrote:

https://www.python.org/dev/peps/pep-0307/#security-issues

On 17 March 2018 at 04:25, David Wilson dw@botanicus.net wrote:

I might have just been thinking about the associated bug -- https://bugs.python.org/issue471893 , it's pretty guru level already. But I could've sworn I saw mailman archive headers flicking past. Still digging, will comment here if I find it.

On 17 March 2018 at 04:15, Alex Willmer notifications@github.com wrote:

If you follow the whole way back through the renames (and conversion from TeX to rst!!) you'll find a date and author. If you then visit the python-dev list archives around those months, there is a guru-level thread basically warning against even what we're doing today.

2 commits appear to fit this bill

I've not found the thread yet.

PS: better pickle punage https://mail.python.org/pipermail/python-dev/2003-February/033320.html

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dw/mitogen/issues/126#issuecomment-373862269, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAJC5Q0JMxMtljHlUcv7eaDXrxksf_2ks5tfD0EgaJpZM4SlX_K .

moreati commented 6 years ago

I might have just been thinking about the associated bug -- https://bugs.python.org/issue471893 , it's pretty guru level already

One of the concerns in issue 471893 is the use of eval() to parse strings. This was fixed by Changeset ab63af, in Python 2.3 onward neither pickle or cPickle invoke eval() directly.