stleary / JSON-java

A reference implementation of a JSON package in Java.
http://stleary.github.io/JSON-java/index.html
Other
4.53k stars 2.56k forks source link

JSON API version: 20180813 converts empty xml elements to empty string in place of empty JSON Object #445

Open nayakbharat opened 6 years ago

nayakbharat commented 6 years ago

Recently, for a requirement I had to move to version:20180813 of JSON API. Earlier I was using version:20090211 of JSON API. For empty xml element, version:20090211 returns empty JSON Object ({}); but version:20180813 returns empty string (""). This change has broke my working application.

Here is chunk of code: org-json

Output with version: 20090211 version-20090211

Output with version: 20180813 version-20180813

stleary commented 6 years ago

Sorry for not responding sooner. Can you identify where this change was introduced? If you want to propose a change, please check the FAQ

pmolchanov2002 commented 6 years ago

Following code at the line #354 in the org.json.XML generates empty strings instead of empty objects:

                    // Empty tag <.../>
                    if (x.nextToken() != GT) {
                        throw x.syntaxError("Misshaped tag");
                    }
                    if (jsonobject.length() > 0) {
                        context.accumulate(tagName, jsonobject);
                    } else {
                        context.accumulate(tagName, ""); <--- Empty string is generated instead of the empty object if object length is 0.
                    }
                    return false;
pmolchanov2002 commented 6 years ago

Is it valid to return an empty string instead of an empty object? Based on the syntax and description on the http://json.org/, an object is not a string:

An object is an unordered set of name/value pairs. An object begins with { (left brace) and ends with } (right brace). Each name is followed by : (colon) and the name/value pairs are separated by , (comma).

A value can be a string in double quotes, or a number, or true or false or null, or an object or an array. These structures can be nested.

johnjaylward commented 6 years ago

Lookin at the git#blame for that block of code, it was last changed 8 years ago by the original author Douglas Crockford .

Given the age of the change, I'm not sure I want to switch it back at this point and possibly cause a breaking change for people that have been using newer versions of the library.

@stleary thoughts? blame output here :https://github.com/stleary/JSON-java/blame/1a811f1ada29098210cb6ec9e733d2648721ba57/XML.java#L356

pmolchanov2002 commented 6 years ago

The empty object change was introduced in the 20131018 version without any comment why the change was made.

johnjaylward commented 6 years ago

@pmolchanov2002 I'm guessing that the reason for the change is because our XML parser doesn't use any context, so it doesn't know what type any particular empty tag should be. Should that empty tag be an empty string, null, empty array, empty object? There are many choices, none of which are very good for a context free XML parser.

pmolchanov2002 commented 6 years ago

@johnjaylward We faced the problem with this implementation.

Say, we have an empty result set object. It may not have any results and it will be returned in the XML as <resultSet/>.

However, for another query, result set may have results and it can be returned in XML as <resultSet><result></result></resultSet>.

The client expects result set with embeded objects, something like: resultSet: { result: {}}.

However, for the empty result set it gets empty string instead of the object, like: resultSet:"".

And client application needs to handle this case in if/else conditions.

johnjaylward commented 6 years ago

Yeah, I'm not disagreeing that it would be best for your application to handle it that way. The problem is that it wouldn't be best for every application. For some applications an empty string may be the right choice. For other, maybe null would be the right choice. Others still may have had an empty array [] as a best choice.

pmolchanov2002 commented 6 years ago

Does it break the definition of the object?

An object is an unordered set of name/value pairs. An object begins with { (left brace) and ends with } (right brace). Each name is followed by : (colon) and the name/value pairs are separated by , (comma).

Even if the object is empty.

By the way, if an empty object has attributes, for example <resultSet name="test"/>, it's serialized as an object, not as an empty string: "resultSet":{"name":"test"}.

johnjaylward commented 6 years ago

You are confusing 2 different concepts here. XML does not have objects. It is a document structure. An XML Element like can represent any number of things. It is not an object.

For your example of <result name = "test" />, that is not an empty XML Element. It has an attribute named name with a value of test. If it was serialized to JSON as an empty string that would be a problem.

however, the XML Element <result></result> or short-form <result /> is an "empty element". It is still not an empty object. A correct value of the "result" element could be a number (<result>5</result>) or a string (<result>this is a string</result>) or an array of other values:

<results>
<result>1</result>
<result>2</result>
<result>this is another result</result>
</results>

All of those are valid XML. When our XML parser sees just an empty Element like <result /> we have no idea what the data type is. The JSON definition of an object is irrelevant.

pmolchanov2002 commented 6 years ago

From the spec (https://www.w3.org/TR/xml/#dt-eetag):

[Definition: Each XML document contains one or more elements, the boundaries of which are either delimited by start-tags and end-tags, or, for empty elements, by an empty-element tag. Each element has a type, identified by name, sometimes called its "generic identifier" (GI), and may have a set of attribute specifications.] Each attribute specification has a name and a value.

The important part here is: Each element has a type, identified by name, sometimes called its "generic identifier" (GI), and may have a set of attribute specifications.

Actually, in my example <result name = "test" /> is an empty element with a single attribute name. Empty elements can have attributes by definition.

From the spec:

[Definition: An element with no content is said to be empty.] The representation of an empty element is either a start-tag immediately followed by an end-tag, or an empty-element tag.

An empty element is still an element, it's not a string or null or anything else.

pmolchanov2002 commented 6 years ago

But I perfectly understand that there may be no reason to change existing implementation as it can easily break dependent applications. Oh....

stleary commented 6 years ago

@johnjaylward thanks for tracking down the history. Given the risk of breaking existing applications, I think it is better not to make a change at this time.

alavrentik commented 8 months ago

can we add a new configuration property to XMLParserConfiguration? like we have for "keepStrings"

stleary commented 8 months ago

@alavrentik No objections if someone wants to try adding an opt-in XMLParserConfiguration flag.