kitodo / kitodo-production

Kitodo.Production is a workflow management tool for mass digitization and is part of the Kitodo Digital Library Suite.
http://www.kitodo.org/software/kitodoproduction/
GNU General Public License v3.0
63 stars 62 forks source link

NPE while accessing non-existing CREATEDATE attribute #4231

Closed henning-gerhardt closed 3 years ago

henning-gerhardt commented 3 years ago

Following null pointer exception is thrown

Exception in thread "Indexing 0 of type PROCESS" java.lang.NullPointerException
        at org.kitodo.dataformat.access.MetsXmlElementAccess.<init>(MetsXmlElementAccess.java:110)
        at org.kitodo.dataformat.access.MetsXmlElementAccess.read(MetsXmlElementAccess.java:194)
        at org.kitodo.production.services.dataformat.MetsService.loadWorkpiece(MetsService.java:102)
        at org.kitodo.production.services.dataformat.MetsService.getBaseType(MetsService.java:81)
        at org.kitodo.production.services.data.ProcessService.getBaseType(ProcessService.java:1643)
        at org.kitodo.production.services.data.ProcessService.addAllObjectsToIndex(ProcessService.java:272)
        at org.kitodo.production.helper.IndexWorker.indexObjects(IndexWorker.java:116)
        at org.kitodo.production.helper.IndexWorker.indexChunks(IndexWorker.java:110)
        at org.kitodo.production.helper.IndexWorker.run(IndexWorker.java:78)
        at java.lang.Thread.run(Thread.java:748)

if the meta data file did not contain a CREATEDATE attribut in /mets:mets/mets:metsHdr.

This was happening after migration 2.x meta data files into the 3.x format on running indexing all data.

The thrown exception was not visible in the frontend nor logged to the Kitodo.Production log. Logging was going to the Tomcat standard log aka catalina.out.

matthias-ronge commented 3 years ago

Unchecked exceptions shouldn't be caught because they shouldn't actually occur, says @Kathrin-Huber. As a result, such errors cannot be logged and are very difficult to detect.

Goal: During metadata migration, the created date should be set in the METS file if it does not exist.

matthias-ronge commented 3 years ago
if (Objects.isNull(workpiece.getCreationDate())) {
    GregorianCalendar gregorianCalendar = new GregorianCalendar();
    gregorianCalendar.setTime(process.getTasks().get(0).getProcessingBegin());
    workpiece.setCreationDate(gregorianCalendar);
}
solth commented 3 years ago
if (Objects.isNull(workpiece.getCreationDate())) {
    GregorianCalendar gregorianCalendar = new GregorianCalendar();
    gregorianCalendar.setTime(process.getTasks().get(0).getProcessingBegin());
    workpiece.setCreationDate(gregorianCalendar);
}

Why is this in a comment instead of a Pull Request?

matthias-ronge commented 3 years ago

I don't have an environment to test this case right now, so I didn't want to make a pull request. Therefore, first of all, only the comment on what the solution should look like.

matthias-ronge commented 3 years ago

^^ you can try if this works for you

henning-gerhardt commented 3 years ago

What I did not understand: there are already some meta.xml files created in tests or used from repo. Why not add or create a new file where this CREATEDATE attribute is missing in /mets:mets/mets:metsHdr and test if this attribute is added with the correct value if your change is working or not. Maybe only by comparing with a pre-generated meta data file.

matthias-ronge commented 3 years ago

When I implemented and tested the migration, with documents both from our server and as part of the test migration, we had no problems with indexing. Therefore, the question is, what do 2.x documents look like in which the date is missing, and what does the associated database look like, and does the indexing work after migrating with this change? Of course I can make up a fictional METS file that will make this work, but I can't tell if the problem went away after that. Perhaps other side effects also play a role. I cannot check that at this point.

henning-gerhardt commented 3 years ago

If there would be more issues with this file(s) I would opened an other issue, put more information into this issue or I did fix them myself somehow. As this CREATEDATE attribute and even the whole metsHdr element is optional in METS but this software or at least the migration part relying on them as required elements I see the software making assumptions which are wrong. In my opinion the software need be fixed independent of what real data is existing in database, index or somewhere else and in my opinion you did not create a whole real world example to test this kind of change.

matthias-ronge commented 3 years ago

The purpose of the migration is to migrate internal files from Production 2 to internal files for Production 3. The fact that these accidentally correspond to the METS standard is of secondary importance at this point, and the migration was never written to prepare random files that are valid METS in such a way that they can be used as internal files for Production 3. That would be a completely different task.

(By the way, I always found and still find it bad that the internal formats of Production are saved as METS, because that has often led people to believe that they can do with this METS whatever the METS standard allowed. But that's not the case. It is just a representation variant of the internal data model, and anything that deviates from it does not work. Because of this, I have always advocated for an independent and more flexible internal data model and file format, namely linked data (to support IIIF) to express this difference, but both were rejected by the steering committee at the time. So be it.)

henning-gerhardt commented 3 years ago

This as not randomly generated files. The error happening on created meta data files from Kitodo.Production 2.x. I did not know why this attribute is missing as I did not look on every generated or modified file. Maybe there was an error in 2.x which nobody noticed or did not happen many times until I found this error on migration.