openpreserve / jhove

File validation and characterisation.
http://jhove.openpreservation.org
Other
164 stars 78 forks source link

JSON leads to ClassCastException #837

Closed WhenSkiesAbove closed 4 months ago

WhenSkiesAbove commented 1 year ago

Hello,

Apologies if this is referred to elsewhere - I was having difficulty parsing through some of the language used in other issues, so it's certainly possible I missed it. If so, feel free to redirect me.

I'm having an issue where the CLI version (on Ubuntu 22.04.1 LTS jammy) throws an error when JSON output is selected for "large" file folders:

Mar 15, 2023 1:52:54 PM edu.harvard.hul.ois.jhove.Jhove main SEVERE: java.lang.Long cannot be cast to java.lang.Integer java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer at edu.harvard.hul.ois.jhove.handler.JsonHandler.showScalarProperty(JsonHandler.java:699) at edu.harvard.hul.ois.jhove.handler.JsonHandler.showProperty(JsonHandler.java:658) at edu.harvard.hul.ois.jhove.handler.JsonHandler.show(JsonHandler.java:439) at edu.harvard.hul.ois.jhove.RepInfo.show(RepInfo.java:598) at edu.harvard.hul.ois.jhove.JhoveBase.process(JhoveBase.java:631) at edu.harvard.hul.ois.jhove.JhoveBase.process(JhoveBase.java:558) at edu.harvard.hul.ois.jhove.JhoveBase.process(JhoveBase.java:558) at edu.harvard.hul.ois.jhove.JhoveBase.process(JhoveBase.java:558) at edu.harvard.hul.ois.jhove.JhoveBase.dispatch(JhoveBase.java:461) at edu.harvard.hul.ois.jhove.Jhove.main(Jhove.java:265)

The folder in question contains 3070 files at 10.4 GB. Outputting to text and XML seems to work fine, and the JSON output works when the folder is much smaller in size/files. I have tried using both Java openjdk 8 and 11 with the same result. The command I'm using is ./jhove -h JSON -o /path/to/new/file.json /path/to/big/folder.

I'm not particularly adept at parsing Java error messages, so any assistance or information would be much appreciated.

Thanks!

carlwilson commented 1 year ago

Quick summary of the problem, this is the code in question, line 699 is the third down and is the "naughty" line in question.

switch (propType) {
    case INTEGER:
        Integer i = (Integer) property.getValue(); // Firing ClassCastException
        propBuilder.add(property.getName(), i.intValue());
        break;
    case LONG:
        Long l = (Long) property.getValue();
        propBuilder.add(property.getName(), l.longValue());
        break;

I'd hoped the issue might be just adding a long type to the report but it's trickier. The cast is the bracketed Integer part here: (Integer) property.getValue();. It tries to take the type returned by property.getValue() and cast it to an Integer object. This should be safe, as the object has a propType, which is checked, and the property declares itself of INTEGER type. Unfortunately the value appears to be a LONG type in reality. To fix I'd need to know the property name and then debug to find why the type was misassigned. I'm currently getting together a v1.28 release candidate for tomorrow, so this won't make that. I will try to reproduce this when the next release work starts, which won't be too far away.

carlwilson commented 1 year ago

This PR may help with this issue, looks to cover your multiple file case and was merged fairly recently:

https://github.com/openpreserve/jhove/pull/728

prettybits commented 1 year ago

@carlwilson I opened a pull request that I think should fix this issue, assuming the stacktrace above is the only problem.

Although I'm not sure why this would affect only the JSON output handler, were the different handlers tested on the same set of files?

carlwilson commented 1 year ago

Thanks for this @prettybits unfortunately just too late for today's RC. Nice work and I'll look it over properly when I return from a short break.

WhenSkiesAbove commented 1 year ago

Thanks everyone for all the responses. I'm new to github, so only understand about an 8th of what's being said, but I did test the other handlers on the same set of records and they work fine - it's only JSON that gives me an error (still present in latest release).

Thanks again everyone for looking into this - it's much appreciated.