Closed Redsandro closed 2 years ago
Please share the images by email - I have not seen this before. You can reach me on my first name (merlijn) on the internet archive website (archive.org)
If you could, please also share the output PDF that you get, for good measure.
Please share the images by email - I have not seen this before. You can reach me on (...)
I have sent you an email. (Please download the attachment before the link expires, even if you don't have time to look at it yet.) I removed the irrelevant pages with text because it happens with this page.
Got it, thanks. It looks like the jpx files in the PDF are CMYK somehow, that's probably related, will let you know.
Removing the transparency layer from the file makes it work, let me see where it goes wrong in archive-pdf-tools then.
I think you've actually hit a pretty significant issue, which isn't been hit in the archive.org path basically ever due to the materials that we deal with, but this explains why I had some trouble recoding some existing digital PDFs when I was toying with a tool for OCRmyPDF compression.
In any case, the commit above should fix it for your case, while I try to think of a way to maybe support transparency in MRC? I don't think there is a way.
Thank you! This works. :+1:
I had no idea that alpha was introduced somewhere in my pipeline, or that it would cause a problem.
I think some editors, some scanners, some pipes, either they intermittently add alpha, or the alpha is only intermittently a problem because I do multiple things in the same worlflow and it isn't always a problem.
I try to think of a way to maybe support transparency in MRC? I don't think there is a way.
I don't think you should waste your time on that. I see no reason why alpha would need to be supported in a document archival tool. Alpha channels in scans, if any, will be 100% opaque near 100% of the time and can safely be discarded.
I've noticed it two times before, and I thought it was a computer issue because I scanned too large at 600 dpi. But now I encounter this for a third time, this time while scanning a small card at 300 dpi. I'm beginning to think this might be a bug.
Original: Left. recode_pdf: Right.![image](https://user-images.githubusercontent.com/1702193/167227982-5bede951-d066-4872-87eb-e40a8a0426c0.png)
My normal workflow:
Is this a known issue? Is there a known workaround? I did a quick search, didn't turn up anything. I'm not sure I can share the full resolution card openly because it is copyrighted, but if this issue is never seen before I am willing to email full resolution file for testing purposes.