Closed paulwalk closed 2 years ago
thanks for this! i appreciate the example work, which will hopefully help reproduce.
credit should go to @asahiko for providing the example
I can't reproduce this when uploading this PDF file on nurax-dev, which is Hyrax 3.0.2. The derived thumbnail is only showing the first page once, not 3 times.
We have the same problem with the 3 image pdf thumbnails. I tried a couple of our problem files on nurax and couldn't generate the error either. Does nurax use stock Hyrax imagemagick settings? I'm also wondering what version of imagemagick it uses?
If it's useful we're using ImageMagick 6.9.10-68 Q16 x86_64
FWIW we're using ImageMagick 6.9.10-23 Q16 x86_64
When stepping through this in a console I see a warning which might help:
**** Error: ICCbased space /N value does not match the ICC profile.
Using the number of channels from the profile.
Output may be incorrect.
From what I can tell, the order of operations in hydra-derivatives for a pdf is:
xfrm = MiniMagick::Image.open(source_path)
xfrm = xfrm.layers[0]
xfrm.flatten
xfrm.resize('338x493')
xfrm.format('jpg')
xfrm.write(output)
When I run this in a console with a problem pdf I am able to reproduce this behavior including the error message above. This appears to be resolved if I change the order of the format and resize commands making the order be:
xfrm = MiniMagick::Image.open(source_path)
xfrm = xfrm.layers[0]
xfrm.flatten
xfrm.format('jpg')
xfrm.resize('338x493')
xfrm.write(output)
(FWIW if I put the format before the resize and the flatten then it behaves differently and doesn't look correct.)
Wouldn't this same problem show on nurax-dev if there is something out of order in the hydra-derivatives code that is causing this?
ImageMagick 7.0.7-39 Q16 x86_64
is in use for nurax-dev right now.
Locally I installed ImageMagick 7.0.11-13 and used the same order of commands above and got the same results. Maybe it has to do with the version of ghostscript instead of ImageMagick? Or maybe it is safe to make this change in hydra-derivatives
? I can make a PR and then we can test it thoroughly.
@jlhardes https://github.com/samvera/hydra-derivatives/pull/227 was working for me when I manually tested commands. I haven't tested this in a real hyrax yet.
Following up on this, I tested this as a monkey-patch in my local app and it works!
I found a case where my proposed fix isn't working but I haven't been able to isolate why.
I am also observing this with MS Word (.docx) files, with Hyrax 2.8.0 and Imagemagick version 6.9.10-23.
The patch above didn't fix the problem. The only effect it had was eliminating the black square below the 3 mini-thumbnails.
When I was working at Data Curation Experts we encountered this problem on a Hyrax system we built for Emory University. The workaround for us was to build imagemagick with specific versions of image libraries, which were documented here: https://github.com/curationexperts/ansible-samvera/blob/main/roles/imagemagick/tasks/main.yml
Thank you @bess! Is that something you can try, @cjcolvar, for the case that still isn't working for you? I wonder if your patch in combo with these recommended versions will help here.
Thanks @bess! We will be discussing this tomorrow at our fortnightly development meeting.
We are seeing this behavior at UNC-Chapel Hill.
Examples:
We need some follow-up to see if the hydra-derivatives
patch along with specific versions of ImageMagick and/or ghostscript are the solution for this issue. It doesn't appear to be testable on nurax-dev.
I believe @bess fix might work is because an old version of Ghostscript being used. See https://bugs.ghostscript.com/show_bug.cgi?id=697223 which I think talks more specifically about what is happening here. Note that the timing lines up with the fact that GS 9.19 was released in early 2016 (see https://www.ghostscript.com/doc/9.55.0/History9.htm). I don't think the Ghostscript maintainers are going to change course here and I don't thinking using a version of Ghostscript from 2016 is a viable long term solution.
I think that @cjcolvar is probably fixing a common case (hydra-derivatives) when it comes to Samvera apps, but there are other tools that can create the same "broken" pdfs (specifically in my research the library img2pdf seems to). I can attest that Chris's fix solves the problem if you start with a non-broken PDF. If you start with a PDF that is grey scale with RGB color profile (aka broken in this way) then there appears to be no saving you... but that is certainly better than "all true greyscale PDFs" having this problem.
I would propose getting Chris's fix in to the mainline of Hyrax (and maybe backport to 2.x) simply because it fixes the majority case and non-ancient Ghostscript seems to be more and more common.
@jlhardes I was able to fix the hydra-derivatives build and get my change in. Testing could start now or wait until a new release of hydra-derivatives is made. Github automatically closed this ticket when I merged my PR so feel free to reopen this issue.
Reopening to see if we can test and confirm this issue is fixed. Thanks @cjcolvar!
New issue was observed that derivatives of PDF/PNG (probably include other image types) were not generated in works. Following discussion indicated that hydra-derivative 3.6.1 was deployed along with the new PR. Link to the work: https://nurax-dev.curationexperts.com/concern/generic_works/gm80hv60r?locale=en
I am unable to replicate this issue on my local build of Hyrax 4.0.0.beta1. I deposited the three examples linked in the ticket and each thumbnail generated correctly:
This is also fixed on nurax-dev. Yay!
Descriptive summary
This phenomenon is found on an instance of Hyrax v2.9
With works containing PDFs, the thumbnail of each work is supposed to be the first page of the PDF. In our instance, when the first page of the PDF is monochrome, we find that the first page is repeated three times on the top edge of the generated thumbnail image. Each of the three mini-thumbnails gets a different colour tint. The example thumbnail image below is from this published work:
This only seems to apply to situations where the first page of the PDF is monochrome. Works with colour have their thumbnails rendered correctly.
Rationale
Provide the rationale or user story that describes "why" this issue should be addressed. Especially if this is a new feature or significant change to the existing implementation.
Expected behavior
The thumbnail should be rendered as a single image of the first page of the PDF
Actual behavior
As described in the Descriptive summary above
Steps to reproduce the behavior
Publish a work containing a PDF where the first page of the PDF is monochrome.
Related work
n/a