rffrasca / PDFKeeper

Open Source PDF Document Management
https://www.pdfkeeper.org/
GNU General Public License v3.0
110 stars 11 forks source link

ArgumentException: The parameter is incorrect. #13

Closed Sohail2949000 closed 12 months ago

Sohail2949000 commented 1 year ago

error pdf keeper

Dear Sir,

Thank you for the PDFkeeper software I loved it. But I am getting a error I don't know about it.

Kindly advise us how we can solve this.

thanks.

rffrasca commented 1 year ago

Hello, Thank you for the kind words. I'm glad you like PDFKeeper.

Before I can look into this issue, I will need PDFKeeper.log to see where the exception occurred. If the error happened during an upload, you would need to move the PDF being uploaded out of C:\Users\your_username\AppData\Roaming\Robert F. Frasca\PDFKeeper\UploadStaging to prevent PDFKeeper from trying to upload it again.

Thanks, Robert

Sohail2949000 commented 1 year ago

Hello Sir,

Kindly find attached PDFKeeper.log for your reference.

Same happened before few days, I thought may be I deleted or move something while backing up sqlite db. I uninstalled it and install again but after couple of files it shows again.

  1. The Issue mentioned above is not letting me upload new pdf those in the screenshot are old pdfs.
  2. Second Sir the upload takes a lot of time. On Average it takes 5 to 6 min to upload a pdf of 10mb.

we will see updates in future or it will be the same? And if possible can you tell us how much pdf files it can handle smoothly and how much data in size it can handle because I have around 10k pdfs of about 150 to 200 GB now pending in my pc which I need to add.

I loved PDFKeeper because it is very easy to organize and manage pdf files in it.

Thanks. PDFKeeper.log

rffrasca commented 1 year ago

After reviewing the log, the unhandled exception is occurring while extracting text using OCR.

During the upload, the PDF is split into separate PDF's, one per page. Each page is processed as follows: If the PDF is an image, then OCR will be used, if text, then IText will be used to extract text. If IText is unable to extract text, then OCR is used. If the PDF contains both image and text and you specified to OCR pages with both text and image data, then OCR is used, otherwise, IText will be used to only extract text, the images will be ignored.

This strategy gives the most complete text extraction. Keep in mind, OCR is a "very expensive" operation that uses a lot of system resources (memory, CPU, and disk). 5-6 minutes to process a 10MB PDF containing all images and then insert into a local database is what I would expect to see.

I know you have some additional questions. The best place for that is in discussions. Feel free to start a discussion and we can continue the conversation there. I would like to focus on the reported issue here.

The exception contains the message: "Image dimensions are too large! Check MaxImageDimension for maximum allowed image dimensions". I'm thinking that one or more pages in the PDF contains an image with dimensions that are exceeding MaxImageDimension which cannot be modified as it is set by the Windows OCR engine. This is something I have not seen before.

Is it possible to provide a sample PDF that can cause the error but does not contain any sensitive data?

Sohail2949000 commented 1 year ago

Hello

Sure, Kindly find the attached pdf for your reference.

Regards. 212457-Golden Cup.pdf

rffrasca commented 1 year ago

Thank you for providing the PDF. With it, I was able to reproduce the issue. The exception is occurring when trying to OCR page 3. I was able to list the page sizes from the PDF using pdfinfo.exe:

Title: Scanned Image Subject: Scanned Image Keywords:
Author: NAPS2 Creator: NAPS2 Producer: PDFsharp 1.50.4000-netstandard (https://github.com/ststeiger/PdfSharpCore) CreationDate: Fri Nov 10 18:07:53 2023 ModDate: Fri Nov 10 18:07:53 2023 Tagged: no Form: none Pages: 3 Encrypted: no Page 1 size: 612.24 x 797.28 pts (rotated 0 degrees) Page 2 size: 596.88 x 851.28 pts (rotated 0 degrees) Page 3 size: 2304 x 3072 pts (rotated 0 degrees) File size: 619232 bytes Optimized: no PDF version: 1.4

Is this a PDF that you generate?

Sohail2949000 commented 1 year ago

Yes Sir these PDFs are our invoices and pictures for proof and references.

is the issue solved? because I was waiting for your answer in order to start using PDFKeeper again.

Thanks again for your time and effort in this issue.

rffrasca commented 1 year ago

The issue is with the page in the PDF that contains the photo. The page size exceeds that set by the Windows OCR Engine. The best I can do is add a check to skip any page that exceeds the dimensions set by the Windows OCR Engine in the next version. I am currently focusing on version 9.0.0 which is not going to be released until end of 2023 or early 2024.

Sohail2949000 commented 1 year ago

Noted.

We will be waiting for the new version.

Thanks Sohail

rffrasca commented 12 months ago

PDFKeeper 9.0.0 was released today.