trailofbits / fickling

A Python pickling decompiler and static analyzer
GNU Lesser General Public License v3.0
390 stars 44 forks source link

Fickling DoS #111

Open coldwaterq opened 1 month ago

coldwaterq commented 1 month ago

https://github.com/coldwaterq/pickle_injector/blob/main/globalLaughs.pt will take a considerable amount of time to process by Fickling. It could also utilize more memory than is likely intended. Pickle parses it fairly quickly though so if Fickling is used in certain pipelines this may be possible to add to a malicious pickle to cause the Fickling scanner to time out and the pickle to still be executed. That would still be a flaw in the pipeline, however I think these pickles should be flagged as maliciuos.

A similar pickle https://github.com/coldwaterq/pickle_injector/blob/main/billionLaughsAlt.pkl has been submitted to Pytorch as a DoS against them, even when using the weights_only flag, so detecting these expansion attacks would be cool IMO. And if you check for the expansion before trying to parse, you could prevent this attack against Fickling as well.

The PyTorch security advisory that has been sitting as a draft since April is https://github.com/pytorch/pytorch/security/advisories/GHSA-g53j-hmj6-fqvw, but the solution is just to tell users of Pytorch that when loading untrusted pytorch saves, even with weights_only, a pickle is still parsed and so pure pickle DoS attacks like this are still possible.

Boyan-MILANOV commented 1 month ago

Hey @coldwaterq, thanks for reporting this!

These are nice Pickle tricks that you're sharing. From what I see the files use different methods for DoSing. We're currently building more detection capabilities into Fickling so this comes at the perfect time for us to put on our radar :) We'll start by finding out why Fickling times out on these files and see if that is fixable. In case it's not, then we can work on a method to detect expansions early and avoid timeout / flag the file accordingly.

coldwaterq commented 1 month ago

Yep, they both do create large lists by duplicating a string. One uses dup, the other puts the string in the memo, and then repeatedly gets it. I did it that way because pytorch's weights_only disables dup.

For Fickling I added in the OrderedDict resolution because Fickling didn't seem to resolve the list unless it was being used in a reduce call.