ruby / psych

A libyaml wrapper for Ruby
MIT License
565 stars 204 forks source link

Add safe_load #119

Closed postmodern closed 11 years ago

postmodern commented 11 years ago

In lieu of the recent Rails YAML RCE vulnerability, Psych should provide a safe_load equivalent method, that only loads Ruby primitives.

tenderlove commented 11 years ago

Can you please be more specific about the desired behavior? Is it acceptable to load Symbols? Are subclasses of Ruby primitives considered "primitive"? If someone sets an instance variable on a string, then dumps / loads that string, should the instance variable survive?

Can you please demonstrate how to use YAML dump / load to execute arbitrary code? AFAIK it depends on objects other than YAML.

postmodern commented 11 years ago

Can you please demonstrate how to use YAML dump / load to execute arbitrary code?

rails_rce.rb

AFAIK it depends on objects other than YAML.

Which is why I believe there should be a safe_load method or a configuration mechanism that restricts YAML to only loading primitives or explicitly allowed Classes.

tenderlove commented 11 years ago

@postmodern does the rails_rce.rb script depend on code other than YAML? I'm confused.

tenderlove commented 11 years ago
  • Symbols, sure they are primitives. In order to prevent a DoS via Symbols, perhaps there should be a configuration > mechanism to specify exactly what types/tags should be allowed.
  • Subclasses of primitives are no longer primitives. In safe mode the custom class could be ignored and the primitive > class used instead.
  • Instance variables in primitives should not survive.

Given these points, why not just use JSON?

postmodern commented 11 years ago

JSON does not support Symbols, does not support non-String-keyed Hashes and does not have special formatting such as:

summary: |
  bla bla bla bla bla bla
  bla bla bla bla

    indented stuff

  bla bla bla bla

Is there any reason YAML should not have a secure mode?

postmodern commented 11 years ago

@tenderlove rails_rce.rb uses !ruby/hash:MyClass and ActionController::Routing::RouteSet::NamedRouteCollection.

@charliesome's PoC used ActiveSupport::Deprecation::DeprecatedInstanceVariableProxy, ERB, Gem::Requirement and Rack::Session::Cookie::Base64::Marshal.

These exploits would not work if the YAML parser was configured to only allow primitives (aka safe_load).

haileys commented 11 years ago

+1. Only primitives that YAML can deserialize without using ! directives should be allowed.

No symbols IMO. I can't see a use case where you would want to allow symbols but still restrict other deserialization.

envygeeks commented 11 years ago

:+1: on what @postmodern and @charliesome are saying. I really wish it supported a way for it to skip over everything and raise what isn't in the specs by default, and certainly like what @charliesome said about skipping over symbols too... the symbols part is kind of important to me because it would make it easier to keep a consistent configuration file to API without having to run to_s on all the keys myself before I create an indifferent Hash.

postmodern commented 11 years ago

Instead of having a big case/when statement of all the tags in to_ruby.rb, Psych could have a table of parser methods/procs. This way Psych could be configured to run in "safe mode". When an unknown tag is encountered, it could skip over it (and it's children) or raise an exception.

envygeeks commented 11 years ago

There is no other option other than raise an exception so it :bomb: and somebody knows they did bad.

tenderlove commented 11 years ago

@postmodern so YAML didn't execute the code, but Rails did?

Is there any reason YAML should not have a secure mode?

I don't particularly care whether it does or not. We just have to be specific about what it means. Also, we need someone to write the patch. :-)

postmodern commented 11 years ago

@tenderlove YAML just happened to call #[]= on an instance of the Class we specified. ActionController::Routing::RouteSet::NamedRouteCollection just so happens to pass the key value from #[]= down to module_eval. We could have used other non-Rails Classes that somehow pass our data into instance_eval or send. CVE-2013-0156 is not really specific to Rails, other applications that accept arbitrary YAML are also vulnerable; such as chef-server which uses extlib.

Please see the Rails PoC write up for a full walk through.

tenderlove commented 11 years ago

Ah, so it's not specific to YAML, but anything that will feed strings to eval. Makes sense.

Aaron Patterson http://tenderlovemaking.com/ I'm on an iPhone so I apologize for top posting.

On Jan 13, 2013, at 7:26 PM, Postmodern notifications@github.com wrote:

@tenderlove YAML just happened to call #[]= on an instance of the Class we specified. ActionController::Routing::RouteSet::NamedRouteCollection just so happens to pass the key value from #[]= down to module_eval. We could have used other non-Rails Classes that somehow pass our data into instance_eval or send. CVE-2013-0156 is not really specific to Rails, other applications that accept arbitrary YAML are also vulnerable; such as chef-server which uses extlib.

Please refer to the Rails PoC write up for a full walk through.

\ Reply to this email directly or view it on GitHub.

benmmurphy commented 11 years ago

Marshal.load has this problem as well and so would any serialization mechanism that does the following with arbitrary Foo classes and arbitrary field values.

obj = Foo.allocate
obj.instance_variable_set(field, value)
envygeeks commented 11 years ago

@benmmurphy it's in Marshal's nature to be like this by simple definition of what it is, YAML however, it's not.

trans commented 11 years ago

I think safe load can have a very simple definition: It simply loads the YAML as if no ! tags are present. The only exceptions would be the YAML standard types which are not the implicit types (e.g. omap).

Psych can add a yaml_tag method so a program can see what the tag is. This would allow a program to look at the tag and decide what to do with it if need be, or it can just ignore the tag if there is no need. Having the tag available is important though. In fact, a fundamental design principle of YAML is the ability to "round-trip" --a YAML document that is loaded and then re-emitted should be substantively the same. So we don't want to loose that information even if it is being ignored in a safe load.

envygeeks commented 11 years ago

:+1: on @trans tag callback idea.

postmodern commented 11 years ago

:+1: @trans. The definition is easy to grok.

NathanZook commented 11 years ago

I have been toying with my own ideas for a framework to do rpc/serialize objects for a while. I really don't know if this is the correct approach. Attribute assignment safety is not, and, in a language such as ruby, cannot ever be global. Indeed, some of the high-level classes (drb, tk, and the like) are the sorts of things that would make me nervous unless I examined them. (I count 82 instances of []= in /usr/local/lib/ruby/2.0.0.) On the other hand, there are a huge number of user classes for which []= is perfectly safe.

My approach is to whitelist. You can examine the existing ruby base and stdlib classes and determine if there is a problem with any of them. There are several ways to proceed from there, but most likely, the data should be held in a data structure inside Psyche. All classes not registered as safe then are unsafe. A facility would be provided to register a class as safe. (HashWithIndifferentAccess comes to mind) Presumably, Psyche might probe a previously unknown class to see if it lists itself as safe.

The next issue is attribute assignment generally. Again, if a whitelist approach is taken, (and exact matches required), it become fairly simple to proceed. Indeed, both of these facilities might well specify that some inputs are safe but not others.

With this approach, you have the full ability to do attribute assignment where it is known to be safe. And if a maintainer does not want to bother, then their classes are simply not safe.

A global switch that says, "while processing the following string, consider everything as safe" would bring back the existing behaviour--which is exactly what would be needed in some cases.

envygeeks commented 11 years ago

@student Protecting setters is your problem, not Pysch's problem. Psych is not your object-guardian. The goal is to block unserializing attributes. Example: If people don't want symbols then they come out as ":value" rather than :value, as far as []= and "protecting it"... that is your job to protect. The rest of your argument, I'm not even gonna touch, I'll leave that to other folk to debate.

NathanZook commented 11 years ago

The question to me seems to be how much pain a user must endure in order to make use of a tool. If YAML is to compete with JSON, then YAML cannot require the user to audit their entire object library for []= weirdness. What's more, I doubt that I'm the only one that did not know that YAML was calling []= until last week--people won't even know about the threat. You can sit back at say nasty things about people who do weird stuff with []= or you can make a library that doesn't regularly get implicated in security problems.

Again, what I'm saying is that safety requires that unsafe actions be screened out by default. To me, that suggests an audit of Ruby's base classes and a facility for classes to register themselves as safe to whatever degree.

haileys commented 11 years ago

@student: Why does this have to be so complicated? @trans's approach is the way to go - deserializing the YAML as if the ! directives didn't exist ensures that only 'primitive' Ruby types are created.

JakubOboza commented 11 years ago

Do we really needs this ? I have a strange feeling that making safe_load will be really hard and in the end nobody will use it because it will be so limited. Lets just not use YAML as something we accept from requests/external sources and just parse. If someone wants to do something like this he should explicitly say he wants to expose himself to problems that it carries with.

In my opinion we should use only/mostly JSON and Ruby. For config files ruby is so expressive that we can have instead of databas.yml like things something that will just load .rb file. aka

MyBaddAssWebAPP::Application.database do
  production do
    host: "localhost" 
     ....
  end
end

Or something similar to this. Do we need YAML at all ?

envygeeks commented 11 years ago

@JakubOboza How is it going to be limited? The fact of the matter is it already has an unserializer to move non-primates into their Objects, at that point it's a matter of having that Object tell you what kind of non-primative the Object wants to be and then you deciding if you want that non-primative, or giving you the ability to block non-primatives (like symbols) right off the bat. Since it has to have a decision engine for them to begin with (see lib/psych/visitors/to_ruby.rb) the logic for rejection is not really all that hard to add in. The callbacks might be hard, but only in the sense of how do you do it properly, not how do you implement it.

JakubOboza commented 11 years ago

Yeah right. So first of all do symbols get ever garbage collected ? No. So basically someone can makes series of reuqests with dozens of symbols and make your VM grow to insane size and die. Performing this DoS attack is not so hard.

So making load_safe or safe_load don't solve a problem but probably creates new ones :)

trans commented 11 years ago

@JakubOboza Using Ruby is the exact opposite of the solution to this issue. It would create even more security issues --insurmountable ones.

@student Is your whitelist solution more complicated than we need? I wonder if whitelisting prior to loading is necessary when we can just safe-load plain YAML and then instantiate any nodes we need proactively. In other words, we can simply apply our own whitelist solution after loading given the plain YAML. That way Psych doesn't need to handle it and all the additional code and api it would entail.

However, I could see a nice general-purpose method for traversing arrays and hashes to do these conversions would be helpful to make that dead simple.

JakubOboza commented 11 years ago

Why ? if you use ruby in config files and never autoparse yaml in requests ? I gave the example just to show that we don't need yaml.

Everything you need from serialization format is list of objects thingy [ ], hash/map/object { } and string "" and i know working is not perfect but thats all and thats the stuff that you have in JSON.

envygeeks commented 11 years ago

@JakubOboza I think you should take a second, recollect your thoughts and come back when you are more rational and not spewing trash. Psych is what makes the non-primitives. So what you are saying is broken and flawed logic because safe_load would prevent the symbols from ever being created and would keep them as primitive strings.

NathanZook commented 11 years ago

First, an apology. I forgot that in general YAML is creating instances of objects, then making calls on these instances or setting attributes on them directly. Some of my first comments were coloured by that error. Clearly, it is safe to set attributes on a newly created instance, so there is no concern there.

However, if a class uses a singleton object, this is no longer safe, as global state is being overwritten. (I observe no occurrences of "ingleton" in my ruby 1.9.3p327 source.)

[]=, is another matter entirely, however. []= is no more a setter than a method ending in ? is a boolean probe. In fact, I would argue that it is less so. There is nothing at all wrong with having some sort of generic class that implements []= as sending messages to other classes. It just breaks some assumptions.

Now as I mentioned before, someone who wants to use gem X (which depends on gem Y) should not have to do research to figure out if it is safe to do so with YAML parsing external input. And just because I need to use a gem that needs a gem that implements an obscure class that is not safe, why should I be prevented from using YAML in the normal way for my entire application?

I audited the rails lib classes yesterday, and as of 2.0.0.preview2, they are clean. It is easy enough to create a structure to hold this data. When a unknown classes is encountered, it can be queried to determine if it has a yaml_safe method. If it returns false, then not even attributes will be set. If it returns true, then []= will be called with abandon. If it returns a symbol, that method will be called instead of []=. If it returns something that responds to call.... Otherwise, the class is registered as unknown, and attributes will be set but []= will not be called.

Of course, the global settings can also be used, and the end user can explicitly register whatever sort of bizarre rules make sense for them. Let the user decide!

With such a facility, the community has an easy way to register the external-input safety of their classes, and 99.9% of the people can proceed in ignorant bliss.

I REALLY want to be able to use YAML for parsing external input. JSON is entirely too lowest-common-denominator. XML is stilted at best. Marshall practically requires both ends to be running the same everything. DrB is for internal use only.

envygeeks commented 11 years ago

This shit is bananas. Seriously though, this thread turned from useful into a bag full of "come again say what?"

trans commented 11 years ago

@student You are still making this more confusing then it is. More importantly, having developers add a yaml_safe? method to their classes does nothing to ensure they are actually safe.

Getting back to your whitelist notion however, I've been thinking about this and it might have some merit. Not exactly in the way you described (which again I think is overly complex) but just as an option passed into the safe load method. e.g. something like:

    YAML.load_file(file, :safe=>true, :whitelist=>{'!foo'=>Foo})

It's not strictly necessary, as I mention in my previous post. But it would be very convenient.

trans commented 11 years ago

I looked at the code a bit. Am I right in concluding the heart of this matter is Visitors::ToRuby#deserialize (here)? If so the main of the solution is to create a #deserialize_safe method that removes the unwanted tag resolutions of #deserialize (some can remain like !binary and !float).

The hardest part looks like it's just getting the safe flag down to the deserialize call so it can be conditioned.

envygeeks commented 11 years ago

Float and Binary are primitives so shouldn't they be excluded by default without prejudice?

_Edit: My 2 pence worth is that maybe there should be a SafeDeserializers and UnsafeDeserializers, where we hold ones that are considered always safe and anything else gets put into unsafe, if you run deserialize_safe then it will exclude all UnsafeDeserializers but if you pass in say :method then it will deserialize anything that applies._

module Deserializers
  module Safe
    module_function
    def my_unsafe_deserializer
      # => Do Work
    end
  end
end

YAML.load(data).safe_deserialize # => String
YAML.load(data).safe_deserialize(:my_unsafe_deserializer) # => Object

Or maybe so the logic makes more sense:

YAML.load(data).deserialize_all # => Objects
YAML.load(data) # => String
YAML.load(data).deserialize(:my_unsafe_deserializer) # => Object

I absolutely like the idea of having to be forced to choose to deserialize everything. It makes people think about what they are doing, where we have to be honest, most programmers now days don't really think twice about what they are doing... so I always like situations that make me think twice.

trans commented 11 years ago

@envygeeks String, Hash and Array are primitives too, but we can't exclude them. But maybe I am misunderstanding?

As a general rule of thumb, I am thinking it is okay to accept any class for which Ruby supports a literal constructor. Not sure about !range, but we do need to support all YAML standard types and that includes Float. As for Binary, I think (correct me if I am wrong) it's really just a String, so it should be okay. I know Symbol has been mentioned above. Is the only security risk with symbols that of a memory overrun? If so, YAML's probably not the right place to be concerned with that. Rather incoming file size should be limited. (Is there a way to set a size limit on the stream parser?)

envygeeks commented 11 years ago

@trans I'm saying that Floats should be allowed to be parsed into their Objects by default.

The problem with the stream size limit is that even if you limit it, if this YAML comes over the public or public but private wire... and don't ask me why it would in the case of non-internal communication but at that point I would probably Marshal the object into a memory storage but we won't even get into that. Anyways, the point is that if it's coming from the internet it's only a matter of changing the symbol each time and a new symbol is created so the stream limit is nearly useless and it will always come down to being able to either white list objects that can be unmarshaled/deserialized or blacklist (the bad option IMO).

Ranges can be easy to safely handle:

str = "1..2"
str = Range.new($1.to_i, $3.to_i, $2.length == 3) if str =~ /^(\d+)(\.{2,3})(\d+)$/

While ghetto built it works.

NathanZook commented 11 years ago

Symbols are a really ugly case. The only way to prevent DOS is to check to see if a symbols is already defined before deserializing it. I have no idea how that might be done. Easier would be to have a list of symbols that are allowed for the call--but very ugly. Furthermore, in practice, if I were wanting to fling arguments for ActiveRecordFind, with its nested conditions, the list of acceptable values changes as we navigate the tree. And if you thought the list idea was ugly before, now you KNOW it is.

There is an option that I have often wanted for Array.flatten, and it strikes me as useful here as well--the ability to specify a depth limit--only deserialize one level, or two, or five.

envygeeks commented 11 years ago

@student Occam's razor. You either accept all symbols or no symbols.

postmodern commented 11 years ago

To highlight the need for this feature, arbitrary YAML deserialization strikes Rails again.

homakov commented 11 years ago

turning YAML into JSON? Utopia

envygeeks commented 11 years ago

I actually prefer it when peeps turn my JSON into YAML. :wink:

postmodern commented 11 years ago

@tenderlove could you possibly look at safe_yaml and maybe talk to @dtao about merging it's behaviour into Psych?

dtao commented 11 years ago

@tenderlove and @postmodern: FWIW, my aim w/ the safe_yaml gem was/is pretty consistent with what @trans suggested: to provide a method which "loads the YAML as if no ! tags are present." In its current form the gem actually clobbers YAML.load, but I'd be totally fine w/ switching back to having a separate YAML.safe_load method.

envygeeks commented 11 years ago

@dtao Having two functions creates a cleaner API with expected results but having load accept a second argument for safe_load as well allows for pure backwards compatibility. /cc @tenderlove @postmodern @trans

homakov commented 11 years ago

+1 for separate function. OR load('---..', safe: true)

lawrencepit commented 11 years ago

+1

More vulnerabilities will be reported by various gems the coming days and weeks. Basically anything that uses the YAML gem to load arbitrary user input is at risk. There are several gems out there that do this unfortunately.

Given rails security strategy that it should be secure by default and the multitude of (breaking) changes that were made in the 3.x series to achieve this, I would logically think YAML.load should not be any different and be secure by default (if it followed the same strategy).

But I'm against breaking working apps. Would the following perhaps work? :

  1. YAML.load works default in unsafe mode.
  2. Calling YAML.safe! will make all subsequent YAML.load calls safe by default.
  3. YAML.load can receive an option to override the global safety setting.

The downside is that any gem might call YAML.safe! for no valid reason. They should instead use YAML.load("---..", :safe => true)

Meanwhile I would advise all ruby developers to use the safe_yaml gem from @dtao, run all your tests, and if nothing breaks, keep on using safe_yaml until this issue is resolved.

haileys commented 11 years ago

@lawrencepit I'm -1 on anything like YAML.safe! that introduces global state. I think YAML.safe_load would be best as it makes grepping easier.

postmodern commented 11 years ago

@lawrencepit Yeah, changing YAML.load's default behaviour would break pretty much everything (esp RubyGems). I'm against introducing complex state, it should be an explicit method-call or option which indicates that a specific YAML blob should be parsed in safe-mode. Explicit > implicit.

dtao commented 11 years ago

@envygeeks @postmodern and @charliesome: I'm kind of torn on this one. I agree in general with avoiding global state and maintaining backwards compatibility; but I actually like @lawrencepit's suggestion a lot from an ease of use standpoint. App developers can change YAML.load to YAML.safe_load everywhere in their own application; but what if their apps depend on gems that also use YAML.load? The benefit of a permanent toggle like YAML.safe! (or I was thinking: YAML.disable_arbitrary_object_deserialization!) is that it can be called once—e.g., in a Rails initializer—and render the rest of an app secure against this particular exploit by default.

Yes, it may prove to be a breaking change in certain cases; but I'd argue those cases are likely to be more rare than the default case where user input needs to be deserialized safely. In these cases app developers could explicitly call YAML.orig_load or YAML.load("--- ...", :unsafe => true), or whatever, when they need to deserialize trusted input to arbitrary objects.

I'm thinking I probably will provide this option in safe_yaml in the next version. (This is probably strictly better than the gem's default behavior now, which is to clobber YAML.load no matter what. In retrospect I see that's probably not the best approach.)

Of course, we all know that the long-term solution is not a standalone gem but a safe option provided through Psych itself, hence this thread.

envygeeks commented 11 years ago

I would make it so that the initiator of the global state could fix the problem on behalf of the software he or she breaks, in that if Gemcutter absolutely needs to deserialize an Object, we can whitelist it in safe_load!. Something like YAML.safe_load!(Object1, Object2) At the same time I can't disagree with the dislike of it, it introduces the need to create extra unit's and it introduces new code complexity, it also introduces the ability for one lib to break another lib by abusing safe_load! by not knowing that it's for the implementer of the lib, not the creator of the lib. That will happen eventually and it's gonna create some huge friction.

postmodern commented 11 years ago

The problem with a global switch, is that other software (RubyGems) may need to deserialize arbitrary YAML from trusted sources after safe_load! is called. Also there is no control on when safe_load! can be called. Developers should first determine whether the YAML input is from a trusted source (ex: not an arbitrary user). I sort of wish we tainted all user-input.

Whitelisting certain Classes is acceptable; perhaps maybe have a global whitelist?