Open carlwilson opened 2 years ago
Observation, if you're going to use an automated code formatter, it would make sense to run it on the entire project to create a baseline. This would allow for cleaner PRs and ease the burden of resolving spurious merge conflicts.
I've spent a fair bit of time looking at this function the past couple of days. Here is the modified implementation that I arrived at: https://github.com/UW-Madison-Library/jhove/blob/48d083b2394fc8b9426d9567744eff519284b830/jhove-modules/xml-hul/src/main/java/edu/harvard/hul/ois/jhove/module/xml/XmlModuleHandler.java#L332
Things to note:
ent
, instead it only set ent
if the request encountered a redirect. However, this turned out not to be a bug. The sole purpose of the code was actually to resolve the redirect. If resolveEntity()
returns null
, the calling code will attempt to resolve the systemId
itself, which it is able to do so long as there are no redirects.resolveEntity()
should set the public id and system id on the returned InputSource as these values are used to resolve relative child entities.Hi Peter, thanks for your super helpful review. This all sounds a little familiar, since I did a review of the XML module code a year or two ago, but can barely remember the details now. I fixed a few bugs, and did a lot of house keeping. The PR (#634) is still awaiting integration, but you might find it of interest, if only to prevent duplicating efforts in places.
A fix for this issue which still retains the old redirection logic was included in commit d223a28 of that PR. I'm not sure if it addresses all of your concerns, since it's been too long for me to remember, but let me know if it doesn't, and maybe I can amend that older PR.
@david-russo Yeah, it looks like your old PR does address the same issue as is addressed in this PR. It does still have the bug when resolving relative entities but that's a different problem.
I haven't looked at your PR in detail, but it looks like there's some good stuff in there, including a few changes that I have also made. It would be really nice if PRs didn't languish here for years, forcing us to duplicate each other's work and creating an ever growing merge problem for PR authors if/when PRs ever start to be merged again.
Cheers. Does this comment relate to the relative entity resolution issue you're describing? If so, from what I remember/wrote, it sounds like there's a hacky solution in the module at the moment, but a proper fix may require a change to what information JHOVE passes its modules.
Yes, but it can be addressed in some (perhaps not all) cases, simply by setting the system id on the returned InputSource. See here
I'll create a ticket for it with a demonstration of the problem later today.
[Edit] That is for entities that are relative a parent. This fix has no affect on relative entities without parents.
Fixes #700 output is now a more informative: