ruby / psych

A libyaml wrapper for Ruby
MIT License
564 stars 203 forks source link

Class of alias nodes differs from class of anchor nodes #67

Open ejames opened 12 years ago

ejames commented 12 years ago

Example gist: https://gist.github.com/3034085

This is inconvenient by itself, but can also cause unexpected problems when moving code between environments that use Syck and Psych.

When the Syck parser loads YAML to a Ruby hash, it will emit objects with the same class for an alias node as it does for an anchor node.

However, the Psych parser will return the alias node as a string. The above gist shows the reproduction for Time objects, but it works similarly for Fixnums, etc.

This can cause problems for code that depends on the class of objects deserialized from YAML.

The Syck parser will automatically create alias nodes if it detects that the same in-memory object is used in multiple places in an object it is converting to YAML, so Syck will create YAML that has a different Ruby representation when loaded by Psych later.

An example bug: I use IceCube (seejohnrun/ice_cube) for handling recurring events. The recurrence information is serialized to YAML, and often contains the same Time object referenced in multiple places. As a result, YAML serialized in an environment that uses Syck (such as Heroku) will cause the IceCube gem to choke when deserialized in an environment that uses Psych. Syck creates alias nodes for the Time object, which Psych returns as String objects, which IceCube does not expect to receive in places where the library "knows" it serialized a Time object.

tenderlove commented 12 years ago

Does this break on 1.9.3-p194?

ejames commented 12 years ago

Just checked, and it works as hoped-for in 1.9.3-p194:

1.9.3-p194 :005 > require 'psych'
 => false 
1.9.3-p194 :006 > YAML::ENGINE.yamler = 'psych'
 => "psych" 
1.9.3-p194 :007 > from_psych = YAML.load_file("example.yaml")
 => {:original=>2012-04-14 11:00:00 UTC, :alias=>2012-04-14 11:00:00 UTC} 
1.9.3-p194 :008 > from_psych[:original].class
 => Time 
1.9.3-p194 :009 > from_psych[:alias].class
 => Time

I opened the issue partly so it would show up in Google for anyone else having the same problem on 1.9.2. Nobody else is in the comments here, though, so I guess it's just me =)

In practice the issue will probably affect developers who generate YAML from a Heroku app but read the YAML outside of the Heroku stack. Heroku configures Syck as the engine for the 1.9.2 stack, but some libraries will use Psych on 1.9.2 given the opportunity. This means you can have an environment outside of Heroku that differs from your Heroku stack. Of course, you don't expect that, because Bundler and RVM are supposed to protect you from that kind of problem, but neither captures the Syck/Psych configuration. Normally you wouldn't have your own code cross a boundary that changes which YAML engine you're using unless your code itself chooses the YAML engine.

Because it's a type error in a dynamically-typed language, any failure caused can occur in code arbitrarily distant from the actual parsing operation. Syck will produce YAML with alias nodes under the covers without giving any indication to client code, and only does it under certain circumstances, so from the developer's perspective, "sometimes" generating YAML creates problems "later on" in the system.

So it's a tough problem to debug if it happens to you, but it's also rare and easily fixed once found - always use Syck to deserialize YAML that was originally generated by Syck.

I've fixed my config and the issue doesn't reproduce in later versions, so you can close this if you'd like. Even if backporting a fix from 1.9.3 were possible, I don't know if it would be worth the time.