Open ilmagowalter opened 2 years ago
@ilmagowalter , is there any reason why do we need to retain trailing/beginning spaces?
i'm facing this use case
i receive a xml file with field like
<tagA> : </tagA>
this tag is defined in xsd schema like
<xs:element name="tagA" minOccurs="0">
<xs:annotation>
<xs:appinfo>
<RicSDO:exampleValues>
<RicSDO:example value="18:29"/>
</RicSDO:exampleValues>
</xs:appinfo>
</xs:annotation>
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:length value="5"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
so the parsed is succesful; i work with this file, trasform to json and store on nosql database.
One of possibilities is export data to xml again; so...if i store in json
{"tagA": ":" }
i will export
<tagA>:</tagA>
and with this element the parse with xsd fail; i have to store
{"tagA": " : " }
@ilmagowalter Thanks for the details.
@stleary Can I add a new method to retain spaces in the string while parsing? like.. new JSONObject().parse(String input, boolean retainSpace)
maybe the parameter have to be added in this method
public static JSONObject toJSONObject(String string, boolean keepStrings)
The easiest fix is to wrap the " : " content in a CDATA section.
If that is not possible, you can try adding a flag to XMLParserConfiguration
. This class is the preferred mechanism for special cases in XML parsing. The code might then look something like this:
XMLParserConfiguration config =
new XMLParserConfiguration().withKeepTrimmedSpaces(true);
JSONObject jsonObject = XML.toJSONObject(xmlStr, config);
Then you will also need to update XMLTokener.nextContent()
and pass in the config object or at least the flag.
JSONML uses this method too, so take care not to break that code.
This approach means that none of the content data in the XML doc will have spaces trimmed, which may not be what you want.
to test only nextContent() method i make this change
public Object nextContent() throws JSONException {
char c;
StringBuilder sb;
// do {
c = next();
// } while (Character.isWhitespace(c));
if (c == 0) {
return null;
}
if (c == '<') {
return XML.LT;
}
sb = new StringBuilder();
for (;;) {
if (c == 0) {
// return sb.toString().trim();
return sb.toString();
}
if (c == '<') {
back();
// return sb.toString().trim();
return sb.toString();
}
if (c == '&') {
sb.append(nextEntity(c));
} else {
sb.append(c);
}
c = next();
}
}
and seems works, but,.. in my project ( not writed by me ), before calling XML.toJSONObject, i have a trasformer ( this.transformer = TransformerFactory.newInstance().newTransformer(); ) that trasform Node to xml to
<?xml version="1.0" encoding="UTF-8" standalone="no"?><CodiceRegione>
190</CodiceRegione>
the output is
{"CodiceRegione":"\r\n 190"}
unfortunatly the real origin (xml origin...various trasformation to node and then ) tag was
Character.isWhitespace() filters more than just the space char.
See https://www.geeksforgeeks.org/character-iswhitespace-method-in-java-with-examples/
You could try retaining the call to isWhitespace() while allowing space chars that are contiguous with the content.
For example, This string contains 8 whitespace chars at the beginning and end: " \r\n : \r\n "
But the parsed string should only contain 2 whitespace chars: " : ".
Closing due to lack of activity. If you think it should be reopened, please post here.
@stleary I have a similar use-case as raised in this issue where I need to retain any existing whitespace between the XML and JSON so they are as close as possible for audit purposes.
Would you accept a pull-request to XMLParserConfiguration
to add a new flag if I added it and wired it through to nextContent()
?
@Brian-McG Sure, this would be allowed. Please ensure in your implementation:
XMLParserConfiguration
propertyXMLParserConfiguration
constructors.I have been working on the above with @Brian-McG. A change was implemented in nextToken() which, when the flag is true, removes the trimming from the string like so:
...
do {
c = next();
} while (Character.isWhitespace(c) && configuration.shouldTrimWhiteSpace());
if (c == 0) {
return null;
}
if (c == '<') {
return XML.LT;
}
sb = new StringBuilder();
for (;;) {
if (c == 0) {
return sb.toString().trim();
}
if (c == '<') {
back();
if (configuration.shouldTrimWhiteSpace()) {
return sb.toString().trim();
} else return sb.toString();
}
...
We ran into an issue where whitespace in between tags is no longer being trimmed and ends up inside of the returned JSON object. An example of such an input:
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"+
"<addresses>\n"+
" <address>\n"+
" <name> Sherlock Holmes </name>\n"+
" </address>\n"+
"</addresses>";
And the result is
{"addresses":{"address":{"name":" Sherlock Holmes ","content":["\n ","\n "]},"content":["\n ","\n"]}}
To address this, a method has been added which executes before jsonObject
is accumulated onto context
in the parse() method. This method removes any entry where the key is the string returned by config.getcDataTagName() and value is only whitespace. I have forked this repo and pushed my change to a branch, diff can be viewed here: https://github.com/keatontaylor10/JSON-java/commit/218d00ecf0e331796b2aafb22172b0243e4e1c44.
All tests are passing successfully and some more have been added. I wanted to get some feedback on whether or not this implementation will work before adding some more test cases and creating a pull request.
@keatontaylor10 The parser code can be tough to update without including unintended side effects, so running into problems like this should be expected. Not sure of the best approach to get the behavior you want without the side effects.
I have created a PR to add this feature https://github.com/stleary/JSON-java/pull/832
when create a JSONObject from XML String like
<?xml version="1.0" encoding="utf-8"?><tagA> : </tagA>
spaces at then begin and at the end of string are trimmed
output
is possible add a parameter to avoid trimming ?
i think that method involved is nextContent() of XMLTokener.java