veraPDF / verapdf-js-viewer

PDF preview based on pdf.js
Mozilla Public License 2.0
3 stars 3 forks source link

[QUESTION]- Highlighting of validation pdf/ua errors #76

Open Samyssmile opened 1 month ago

Samyssmile commented 1 month ago

Let's say I use VeraPDF for PDF/UA validation and receive a list of validation results from VeraPDF.

Is it possible to highlight the relevant areas in the PDF? For example, if there is an error indicating that an image in the PDF does not have ALT text, is it possible to use information from VeraPDF to make VeraPDF-js-viewer highlight the image? This would allow us to quickly and easily identify which image is causing the problem.

I will be very happy about an example or documentation.

I also asked for this on StackOverflow: https://stackoverflow.com/questions/78826710/how-to-highlight-sections-in-a-pdf-with-pdf-js-where-verapdf-pdf-ua-validator-ha

KateOrient commented 1 month ago

Thanks you for reaching out!

The bboxes with the location information from veraPDF validation results can be used to highlight these problematic areas. For this you should pass an array of bboxes to the verapdf-js-viewer. page and location are two required fields for each bbox, e.g. the bbox that highlights a figure on the first page:

[{
  location: 'root/document[0]/StructTreeRoot[0]/K[0](7 0 obj Document)/K[0](14 0 obj Sect)/K[1](12 0 obj Figure)',
  page: 1
}]

Bbox location should be retreived from the context or locationContext fields from the verPDF report. The context field typically contains the path to an item in the PDF structure (such as structure tree, content stream, or annotations). The locationContext usually contains a bounding box position (e.g. {"bbox":[{"p":1,"rect":[x,y,x2,y2]}]}) that can be set to the bboxes as is and takes precedence over the context if both are present.

Bbox page should be retreived either from the p field in locationContext or from the PDF structure tree using the context path. You can access the structure tree in onLoadSuccess method in the following way:

onLoadSuccess={(document) => {
  const structureTree = document._pdfInfo.structureTree;
  ...
}}

Note: PDF.js counts pages starting from 1, while veraPDF counts pages starting from 0, so you should always add 1 to the p field before using it in bbox property.

See more information on supported types of bbox location in our README in bboxes input parameter description: https://github.com/veraPDF/verapdf-js-viewer?tab=readme-ov-file#input-parameters