Version 0.14.0 takes surrounding white space and new lines with the parsing

martinblech / xmltodict

Python module that makes working with XML feel like you are working with JSON

MIT License

5.52k stars 462 forks source link

Version 0.14.0 takes surrounding white space and new lines with the parsing #361

Closed thebetar closed 1 month ago

thebetar commented 1 month ago

I was running a pipeline this morning and tests were failing all of a sudden. After doing some debugging we found out it was because xmltodict was updated and was causing errors in our unit tests. The error was caused by our previous expected output file expecting trimmed strings like

[
  "a",
  "b",
  "c"
]

which first succeeded but after this update the new result from the parsing xml to a dictionary was

[
  "            a           \n",
  "\n            b           \n",
  "\n            c           ",
]

I will see if I have time this weekend to try and create a PR for this issue.

Meallia commented 1 month ago

We also ran into this issue, git bisect pointed us to this commit: https://github.com/martinblech/xmltodict/commit/c9f1143694c52666818715e865a56ffc46d9232e

c9f1143694c52666818715e865a56ffc46d9232e is the first bad commit
commit c9f1143694c52666818715e865a56ffc46d9232e
Author: Trey Franklin <trey.franklin@adtran.com>
Date:   Mon Feb 8 11:39:45 2021 -0600

    Ensure significant whitespace is not trimmed

 tests/test_xmltodict.py | 4 ++--
 xmltodict.py            | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

anlambert commented 1 month ago

Same issue here, some of our tests start to fail since xmltodict v0.14.0 due to the issue mentioned above.

mbyrne00 commented 1 month ago

Same issue here. For context this update seems to be tied to:

In our case when we had an empty xml element with only whitespace we expected it to be trimmed to None and it no longer is. It's a breaking change in behaviour that existing codebases have had to work with in previous versions (rightly or wrongly). In our case we just had to change our expectation, but would be nice if there were still an option to ignore empty xml elements that only contain whitespace.

thebetar commented 1 month ago

I have made a PR with a suggestion to resolve this issue while still keeping the change of the original PR https://github.com/martinblech/xmltodict/pull/362

phil-nelson-bt commented 1 month ago

This is a breaking change for us as well. I also agree that the choice to retain whitespace or not is a context specific question and should be a parameter. I'd also suggest that with a version change giving no clues, .13 to .14, to retain the old behavior and enable preservation of whitespace via the option.

martinblech commented 1 month ago

Reverted the changed and pushed v0.14.2

thebetar commented 1 month ago

@martinblech so is the original PR which changed the whitespace logic not the way it was intended to be?

phil-nelson-bt commented 1 month ago

Thanks for doing this, I'd agree that the change was a regression. As for whitespace, we just need to think back to the days when XML was really adopted. Apologies to those that already know all this! It's main reason for existing came out of the document world and when we wanted more structured documents. XHTML was a huge part of the design considerations and preserving whitespace is essential for those kinds of uses. XML's use for plain data was part of the designs and goals too and stripping whitespace really helps with that. Today, document oriented XML is probably a minor footnote, but not gone. Taking the PR to add an option to retain whitespace seems like a great next step. And would be backwards compatible.

martinblech commented 1 month ago

@thebetar it was a regression because it changed the default behavior into something that most users don't want.