openpreserve / jhove

File validation and characterisation.
http://jhove.openpreservation.org
Other
164 stars 78 forks source link

Removes invalid characters 0xfffe and 0xffff when cleaning XML #878

Closed karenhanson closed 4 months ago

karenhanson commented 11 months ago

This removes two additional forbidden characters when cleaning XML to be output by the XMLHandler in Utils.encodeContent() and Utils.encodeValue(). If you leave these characters in the XML, it generates an invalid character message during validation of JHOVE output.

Relates to https://github.com/openpreserve/jhove/issues/877

codecov[bot] commented 11 months ago

Codecov Report

Patch coverage has no change and project coverage change: +16.85% :tada:

Comparison is base (8a4d1ce) 30.15% compared to head (a3f3c23) 47.00%. Report is 3 commits behind head on integration.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## integration #878 +/- ## ================================================== + Coverage 30.15% 47.00% +16.85% - Complexity 651 1100 +449 ================================================== Files 57 57 Lines 9079 9079 Branches 1622 1622 ================================================== + Hits 2738 4268 +1530 + Misses 5965 4280 -1685 - Partials 376 531 +155 ``` | [Files Changed](https://app.codecov.io/gh/openpreserve/jhove/pull/878?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=openpreserve) | Coverage Δ | | |---|---|---| | [...src/main/java/edu/harvard/hul/ois/jhove/Utils.java](https://app.codecov.io/gh/openpreserve/jhove/pull/878?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=openpreserve#diff-amhvdmUtY29yZS9zcmMvbWFpbi9qYXZhL2VkdS9oYXJ2YXJkL2h1bC9vaXMvamhvdmUvVXRpbHMuamF2YQ==) | `62.16% <ø> (+41.44%)` | :arrow_up: | ... and [33 files with indirect coverage changes](https://app.codecov.io/gh/openpreserve/jhove/pull/878/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=openpreserve)

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

karenhanson commented 10 months ago

This appears to address #880, which has a sample file attached that can be used for testing.

Update Oct 30, 23: from discussion on #880, this fixes the error message on that ticket, but does not address the underlying issue, which appears to be to do with how the bits are read in. The file on that ticket can be used to test though.