veraPDF / verapdf-webapp-server

Backend service for the veraPDF web application
GNU General Public License v3.0
6 stars 6 forks source link

Takes a long time to create a Report #135

Closed NickHarnau closed 1 year ago

NickHarnau commented 1 year ago

Hey :) The following PDF took nearly 10hours to create a Report in Docker. I had https://www.heilbronn.de/fileadmin/user_upload/DV_Dienstleistungen-Amt62_2021.pdf

Is this just a very complicated PDF or are there other issues? When I try to upload this file on duallab it also takes a very long time and on pdfchecker.nl I am even getting an error uploading the file.

bdoubrov commented 1 year ago

@NickHarnau thanks for reporting this issue. It turned out to be related with the table recognition algorithm that, in particular, analyses all vertical and horizontal lines on the page. In this particular document there was a lot of line art, which causes degradation of performance. We have fixed this issue now ignoring all lines that are located inside Figure structure element.

The latest dev version of veraPDF (1.23.149), also available at https://verapdf.duallab.com does already include this fix.

NickHarnau commented 1 year ago

Thanks for taking care of it! :) How can I get the latest dev Version? I mean the docker-compose file gets the images from e.g. ghcr.io/verapdf/IMAGE -> I tried to run it locally in my Docker but it still faces the issue. I assume you have to update this the next days? :)

irinamavrina commented 1 year ago

@NickHarnau Hey, you can download latest images from here: worker, file-storage and job-service