Open GoogleCodeExporter opened 9 years ago
Additional information on my motivation for requesting this:
I don't always control the source of the YAML. There are cases where I receive
the YAML document from
another producer, often in another language or using a different library. This
document may contain tags
specific to the producer application that my application doesn't understand or
cannot (in the case of some
output from other languages). However, the partial representation is still
useful to my application.
Currently the only way to consume this document with SnakeYAML is manually code
a case for each of these
unknown tags. Again, this assumes somewhat that I have control over the
producing application, e.g. I know
when a new tag is introduced. Again, where the producing application is out of
my control, the introduction
of a new tag breaks my consuming application until I add code to handle the new
tag.
Not supporting partial representation severely hampers the utility of SnakeYAML
for doing interop with other
systems and languages.
Original comment by toolbea...@gmail.com
on 8 Dec 2009 at 1:34
Excuse me, since I do not see the difference with issue 31, I will just repeat
the
arguments.
The spec says:
A complete representation is required in order to construct native data
structures
It clearly indicates that Java objects (native data structures) cannot be
created for
partial representation.
Please note that partial representation is supported in SnakeYAML - use low
level API
to produce nodes and then create Java objects on your own.
The logic "unknown tag" -> "str" is application specific. If you wish to
implement it
you can have a look at BaseConstructor.getConstructor(Node) to find a solution
which
works for you.
If you do not have control over the incoming documents what happens when there
is a
global tag with a class name you do not have this class in your classpath ?
Original comment by aso...@gmail.com
on 8 Dec 2009 at 12:26
@asomov, where I work, it is considered impolite to reopen closed issues, hence
I opened this issue. If it is
more appropriate to ask that issue 31 be reopened, please let me know and I
will move my discussion there.
"It clearly indicates that Java objects (native data structures) cannot be
created for
partial representation."
By this do you mean Java objects other than List, Map, and String?
"The logic "unknown tag" -> "str" is application specific."
Are you interpreting the spec as saying all unknown tags must be assumed to be
"str"? If so, I believe that is
incorrect. The spec says, "In such a case, the YAML processor may compose an
partial representation, based
on each node’s kind and allowing for non-specific tags." I interpret that to
mean that SnakeYAML can
compose a representation consisting of List, Map, and String objects. Though
not as useful as a complete
representation, such a representation still has utility.
"Please note that partial representation is supported in SnakeYAML - use low
level API to produce nodes and
then create Java objects on your own...If you do not have control over the
incoming documents what happens
when there is a global tag with a class name you do not have this class in your
classpath ?"
I may have misunderstood the goals of SnakeYAML. Is it primarily for the
serialization of native Java object
graphs from and back into Java?
I assumed one goal was to be a general purpose YAML processor. For such a
processor, it should be
convenient to consume YAML produced by other systems and programming languages.
I made this
enhancement request after having some difficulty consuming YAML produced by
Perl's YAML::Syck which
emits YAML containing tags for native perl objects, as in this example:
perl -MYAML::Syck -e 'print YAML::Syck::Dump(bless { a => 1 }, "My::Perl::Class")'
--- !!perl/hash:My::Perl::Class
a: 1
In my case, consuming the YAML produced by Perl as a simple data structure of
sequences, maps, and scalars
is of sufficient utility that I have no need to implement equivalent model
objects in Java as exist in my Perl
application.
I stand by this enhancement request. Having to resort to the low level API is a
shortcoming for a general
purpose YAML processor.
That said, for my current work I will look to the low level API as I should be
able to work around the issue I'm
currently facing. Thank you for that advice.
Original comment by toolbea...@gmail.com
on 8 Dec 2009 at 2:32
I do not mind to open a new issue each time.
"By this do you mean Java objects other than List, Map, and String?"
I mean anything which extends java.lang.Object
"Are you interpreting the spec as saying all unknown tags must be assumed to be
"str"?"
No, "unknown tag" -> "str" was given as an example.
"it should be convenient to consume YAML produced by other systems and
programming
languages"
I completely agree with this statement.
Look, when you give this YAML document:
--- !!perl/hash:My::Perl::Class
a: 1
...
to Python, Ruby or VisualBasic do you expect it to work or to fail ?
(I am sure it will fail !)
If Python would fail why suddenly you expect that Java shall work ?
I propose the following solution:
1) the PERL parser shall not emit language-specific tags. Instead it should emit
--- !MyClass
a: 1
...
Then it is easier to consume the document by other parsers
2) take a look here, it might help:
http://code.google.com/p/snakeyaml/source/browse/src/test/java/org/yaml/snakeyam
l/rub
y/RubyTest.java
3) since this is a second request I have added an example to ignore tags:
http://code.google.com/p/snakeyaml/source/browse/src/test/java/examples/IgnoreTa
gsExa
mpleTest.java
Please take a look and let me know whether it is close to what you want to
achieve.
Original comment by py4fun@gmail.com
on 8 Dec 2009 at 5:21
Original comment by py4fun@gmail.com
on 10 Dec 2009 at 9:47
So here's one workaround I came up with (Groovy JUnit test attached). It
amounts to this custom Constructor:
class TagIgnoringConstructor extends Constructor {
protected Object callConstructor(Node node) {
switch (node.getNodeId()) {
case "scalar":
node.tag = "tag:yaml.org,2002:str"
break;
case "sequence":
node.tag = "tag:yaml.org,2002:seq"
break;
case "mapping":
node.tag = "tag:yaml.org,2002:map"
break;
}
return super.callConstructor(node)
}
}
...
Yaml yaml = new Yaml(new Loader(new TagIgnoringConstructor()))
yaml.load(...)
Is there a more polymorphic way to do this? I'd like to apply the "replace
switch with polymorphism"
refactoring, but didn't find anything in the JavaDoc to facilitate that.
Original comment by tim.tay...@eprize.com
on 15 Dec 2009 at 11:38
Attachments:
"So here's one workaround I came up with (Groovy JUnit test attached)..."
Which I now see is similar to the example referenced in comment #4. In both
cases, the switch statement is a
code smell. Is it possible to replace that with polymorphism?
Original comment by tim.tay...@eprize.com
on 15 Dec 2009 at 11:44
I consider example from comment 4 and my similar one in comment 6 to be
workarounds for a limitation of
SnakeYAML. It should be easier to use SnakeYAML as a general purpose YAML
processor. I shouldn't have to
subclass a new Constructor to coerce it to load a YAML document as a simple
data structure of sequences,
maps, and scalars.
YAML tags are similar to XML's xsi:type attribute. Any XML parser can parse the
following even if it doesn't
understand what type "foo" is:
<a xsi:type="foo" value="1"/>
Higher level tools, such as XStream, build on top of general purpose XML
parsers to provide Java object
serialization.
Based on the current implementation, SnakeYAML is more like the YAML equivalent
to XStream. It cannot be
easily used as a general purpose YAML processor. Here is what I would consider
easy:
Yaml yaml = new Yaml();
yaml.setIgnoreTags(true);
yaml.load(...);
One line of additional code, instead of several, would allow me to use
SnakeYAML as a general purpose YAML
processor the same way I can easily use any XML library in a general purpose
way.
"I propose the following solution:
1) the PERL parser shall not emit language-specific tags. Instead it should emit
--- !MyClass
a: 1
...
Then it is easier to consume the document by other parsers"
You assume that I control the code producing the YAML. In my case, I do. But
it's legitimate to want to use
SnakeYAML to parse YAML that you have no control over.
I believe this is a valid enhancement request. Using SnakeYAML for general
purpose YAML processing
shouldn't take a back seat to using it for native Java object
serialization/deserialization. General purpose
processing should at least have equal weight.
Original comment by tim.tay...@eprize.com
on 16 Dec 2009 at 12:25
I think I should blog on this topic if it causes so much misunderstanding.
1)
Can you please give a definition of a "general purpose YAML processor" ?
What I see now is that "general purpose YAML processor" is SnakeYAML with
'yaml.setIgnoreTags(true)' implemented.
2)
>I shouldn't have to subclass a new Constructor to coerce it to load a YAML
document
>as a simple data structure of sequences, maps, and scalars.
You do not have to !!! You only need to do it when the input is inconsistent or
you
do not know how to make it consistent.
3)
YAML tags are similar to XML's xsi:type attribute. Any XML parser can parse the
following even if it doesn't understand what type "foo" is <a xsi:type="foo"
value="1"/>
SnakeYAML can also parse a valid YAML document. Please check the low level
'parse()'
method.
4)
>You assume that I control the code producing the YAML
I do not. I simply state that YAML producers should be aware that the content
they
generate can be consumed by different parties
5)
>I believe this is a valid enhancement request
I completely agree. Provided that we understand the request, its implementation
and
its consequences
6)
>Using SnakeYAML for general purpose YAML processing shouldn't take a back
seat...
If "general purpose YAML processing" for you is like XML parser please use low
level
parsing. Like XML parsing it simply provides naked Strings or Lists.
7)
>Here is what I would consider easy:
>Yaml yaml = new Yaml();
>yaml.setIgnoreTags(true);
>yaml.load(...);
I consider this completely unclear.
Which tags do you propose to ignore ? If a tag is perfectly valid should it be
ignored ? Should we also ignore implicit types (123 -> int)?
What happens when we got this (should it be a String or Integer ?):
---
!!int 123
...
Should we raise an error in this case:
---
!!map [1, 2, 3]
...
What are the criteria for a tag to be ignored if this method is introduced?
8)
Let us see an example with XStream.
<person>
<firstname>Joe</firstname>
<nonsenseString>aaa</nonsenseString>
<nonsenseInt>123</nonsenseInt>
</person>
The 'Person' JavaBean does not have nonsenseString and nonsenseInt properties.
Please note that I do not want to simply parse the XML (which is no problem of
course) but I wish to create a statefull 'Person' instance. Can XStream create
such
an instance ?
Do you consider XStream as a "general purpose XML processor" ?
I am afraid you need to explain the request taking into account all the
consequences.
Now I see the following: I wish to drop any trash to SnakeYAML and it must be
able to
create a valid Java instance anyway.
Original comment by aso...@gmail.com
on 16 Dec 2009 at 9:44
Are you talking about local tags ? If only local tags are ignored does it solve
the
problem ? (global tags and implicit types work as usual)
Original comment by py4fun@gmail.com
on 17 Dec 2009 at 8:22
I'm not explaining myself well or clearly. I will try to summarize my position
a different way. Then I'll respond
to comment 9 and comment 10 above.
Summary
=======
I consider these three use cases for YAML of near equal value:
a) Dump native data to YAML. Load YAML back to equivalent native data
structure.
b) Dump native data to YAML. Load YAML to structured representation with as
many native types as possible,
but not necessary all native types.
c) Dump native data to YAML. Load YAML to structured representation made up of
List, Map, and String
SnakeYAML succeeds at making all three use cases possible. However, only (a)
can be done with the high level
API. I consider that a shortcoming.
According to the spec, only (a) is "complete success". Use cases (b) and (c)
are "failure modes". However, these
failure modes still result in a representation that can be useful to many
applications. I contend that that use
case (b) is as prevalent as (a) if not more prevalent. I can be convinced that
use case (c) is less prevalent. I see
less utility for use case (c) when use case (b) is available via the high-level
API.
Example: a complete, but non-native representation
--------------------------------------------------
Alice's application has the "Autos" and "Currency" libraries. Her application
dumps the following:
!!autos.Car
plate: 12-XP-F4
value: !!Money {amount: 8113.00, currency: USD }
Bob receives the above YAML. His application only has the "Currency" library.
According to the spec, the YAML
processor can create a complete representation, but not a native one because
it's lacking some native types
(autos.Car). SnakeYAML should have a straightforward way through the high level
API to load this. An easy
implementation would be use case (c) and ignore all tag information and do no
implicit typing. A somewhat
more useful implementation would be use case (b), to construct those native
types that are available (value →
Money) and to also do implicit typing (value.amount → float).
This is essentially what I understand from this part of the spec:
In a given processing environment, there need not be an available native type corresponding to a given tag.
If a node’s tag is unavailable, a YAML processor will not be able to
construct a native data structure for it. In
this case, a complete representation may still be composed, and an application
may wish to use this
representation directly.
Again, I understand the low-level API makes this possible (comment 4). It
should be possible with the high-
level API.
Response to comment 9
=====================
1) "General purpose YAML processor": the high-level API of SnakeYAML is suited
only to use case (a). I believe
this is the narrowest possible interpretation of a YAML processor from the
spec. A general purpose processor
would support (b) and (c) as conveniently as it supports (a).
2)
> > I shouldn't have to subclass a new Constructor to coerce it to load a YAML
document
> > as a simple data structure of sequences, maps, and scalars.
>
> You do not have to !!! You only need to do it when the input is inconsistent
or you
> do not know how to make it consistent.
Having to subclass a new Constructor to achieve use cases (b) and (c) through
the low-level API is a
shortcoming.
3)
> SnakeYAML can also parse a valid YAML document. Please check the low level
'parse()'
> method.
Having to use low-level parse() for use cases (b) and (c) instead of high-level
load() is a shortcoming.
4)
> > You assume that I control the code producing the YAML
>
> I do not. I simply state that YAML producers should be aware that the content
they
> generate can be consumed by different parties
I agree with you that they *should*. But the reality is that (some, many,
most?) won't. The downstream effect
is that my application must pay the price in terms of complexity to handle this
YAML when I use SnakeYAML.
Of course, I do the smart thing and create a wrapper around SnakeYAML so the
complexity occurs once and is
hidden from the rest of my application (or applications). But that still means
every individual or organization
has to write this same wrapper. Based on my contention that use case (b) is
prevalent, that's a lot of repeat
effort. It should be easier to do.
7)
> > Here is what I would consider easy:
> >
> > Yaml yaml = new Yaml();
> > yaml.setIgnoreTags(true);
> > yaml.load(...);
>
> I consider this completely unclear.
Agreed. That was a bad proposal on my part. What I intended was more something
like
`yaml.setIgnoreUnavailableTags(true)` or `setFullyNative(false)`.
But my point wasn't to propose a specific method name, or that it should even
be a method on Yaml. Instead I
was trying to contrast several lines of code subclassing Constructor with a
one-liner.
8)
> Let us see an example with XStream.
>
> <person>
> <firstname>Joe</firstname>
> <nonsenseString>aaa</nonsenseString>
> <nonsenseInt>123</nonsenseInt>
> </person>
>
> The 'Person' JavaBean does not have nonsenseString and nonsenseInt
properties.
> Please note that I do not want to simply parse the XML (which is no problem of
> course) but I wish to create a statefull 'Person' instance. Can XStream
create such
> an instance ?
I agree that should fail and the YAML/SnakeYAML equivalent should also fail.
Extending my example up top:
!!autos.Car
plate: 12-XP-F4
value: !!Money {amount: 8113.00, currency: USD, garbage: "boom" }
Bob's application, which has the "Currency" library, would fail to construct a
Money instance for `value` when
it attempted to set the property `garbage`.
However, if Bob removed all dependencies on the Currency library from his
application, and then removed the
currency library, then something similar to `new
Yaml().setIgnoreUnavailableTags(true).load(...)` would work.
Instead of a Money instance, `value` would just be a Map.
> Do you consider XStream as a "general purpose XML processor" ?
No. I consider it a narrow tool that does use case (a) (except for XML, not
YAML). But now I think we're getting
to the crux of our disagreement.
I don't consider XStream's singular focus on use case (a) to be a shortcoming.
Why then do I judge SnakeYAML
differently? Because unlike with XML, there aren't an abundance of competing,
complete, quality
implementations of YAML processors the way there is for XML parsers; there's
SnakeYAML and then then
there's...SnakeYAML. Yours is the only one that meets those criteria (for Java)
that's actively maintained.
I believe I understand your position now. High-level SnakeYAML is equivalent to
XStream. Low-level
SnakeYAML API is equivalent to an XML parser.
> I am afraid you need to explain the request taking into account all the
consequences.
I think I misunderstand you. Otherwise, that's an unfair burden to place on
someone contributing feedback.
I *have* spent a good amount of time reading (and re-reading) the YAML spec to
make sure my position is
reasonable and valid. You're saying I must anticipate all of the consequences
before sharing my idea?
> Now I see the following: I wish to drop any trash to SnakeYAML and it must be
able to
> create a valid Java instance anyway.
Per above, that's not what I'm asking for.
Response to comment 10
======================
> Are you talking about local tags ? If only local tags are ignored does it
solve the
> problem ? (global tags and implicit types work as usual)
For use case (b), implicit tags as well as recognized and available native
types would work as usual. My
`setIgnoreTags(true)` in comment 8 was a badly named proposal. Per above, I
meant something akin to
`setIgnoreUnavailableTags(true)`.
But no, I don't think a distinction between local and global tags would do it.
It's possible to have a global tag
reference an unavailable native type, correct?
Original comment by toolbea...@gmail.com
on 13 Jan 2010 at 7:54
I believe I've made my position clear. If not, I lack the stamina to try and
explain myself again. What remains should be differences of opinion on which
use cases the
high-level API should support. If you disagree with my position, then go ahead
and close/cancel this enhancement request.
Original comment by toolbea...@gmail.com
on 13 Jan 2010 at 6:37
First of all - thank you very much for your time. I think this issue gives a
good
overview for anyone who wants to manage "strange" tags coming from another
parser.
In general I agree with your proposal to be more flexible. I just do not get
how to
resolve minor issues (which become big when you try to implement them).
I see better you position now and I do not want to reply to every statement (I
will
save some disk space for Google :).
Since I do not see a real business case for myself I cannot really implement
your
requirement.
I think we can proceed as following. (And you get my full support for it.)
- create a Mercurial clone (http://code.google.com/p/snakeyaml/source/clones)
- write a test case. No problem it fails, at least we can see what we want to
achieve
at the end
- try to implement the feature. You are free to change _anything_. Just keep in
mind
that the existing tests must succeed.
- once we see the new code we can discuss the required changes and the
consequences
P.S.
I was trying to introduce an interface which is called when the tag is unknown.
Similar to what is done for error handler in SAX when parsing XML.
Unfortunately it
became more complicated then I expected and I dropped it.
Original comment by py4fun@gmail.com
on 14 Jan 2010 at 10:21
[deleted comment]
"Each comment triggers notification emails. So, please do not post "+1 Me too!".
Instead, click the star icon."
Notification emails are exactly what I want, though I starred it, too.
I also struggled with it for a while and had to change methods to get anywhere.
A generic mode would be an excellent addition IMO.
Original comment by fred.co...@gmail.com
on 30 Sep 2012 at 4:54
Original issue reported on code.google.com by
toolbea...@gmail.com
on 8 Dec 2009 at 1:22