veraPDF / verapdf-webapp-server

Backend service for the veraPDF web application
GNU General Public License v3.0
6 stars 6 forks source link

Location of checks is always empty #140

Open BennyAlex opened 1 year ago

BennyAlex commented 1 year ago

hello,

When recieving the check informations, the location property is always null. image

It would be nice to have it available again.

bdoubrov commented 1 year ago

The error contains two properties that are used to identify the object causing the problem:

Typically location is null for Machine checks of PDF/UA-1, and is not null for additional human WCAG checks.

It is assumed that the PDF viewer would be able to compute the location bbox from the context, if the location property is null. A sample implementation of such logic is available in the PDF viewer module based on pdf.js: https://github.com/veraPDF/verapdf-js-viewer

BennyAlex commented 1 year ago

@bdoubrov Yeah Thanks you.

Unfortunatelly the js-viewer is causing many problems, since its relaying on a beta version of react-pdf. This version is not compatible with vite, which we are using for our project. I tried upgrading react pdf, but the js-viewer is using an internal prop _pdfInfo.structureTree which is not available any more in newer versions of react-pdf. So I am trapped and I there is no way of using the js-viewer, unfortunatelly.

So I thought having the bbox infos it is easy to show them by myself.

BennyAlex commented 1 year ago

@bdoubrov Another problem I am facing is the inconsistent context paths. Somethimes its something like this:

"root/doc[0]/StructTreeRoot[0]/children[0](173 0 obj Sect Sect)/children[11](1331 0 obj P P)"

other times it's like: "root/document[0]/pages[0](74 0 obj PDPage)/contentStream[0]/content[0]/contentItem[0]"

The second example its easy to get the page number and the index of an item.

In my opinion, the report should output always the same unified format.

bdoubrov commented 1 year ago

If this helps, we plan to upgrade the viewer to the latest stable version of react-js (6.2).

The context info is consistent within the veraPDF validation model, that is it shows the path to the object in question in the graph of all veraPDF objects created by PDF parser.

For example, two different formats i your example come from two different parent objects in the model:

The server-side calculation of the bounding boxes for PDF/UA-1 objects is not implemented (yet). But it is on our radar for the future development.