samvera / hyrax

Hyrax is a Ruby on Rails Engine built by the Samvera community. Hyrax provides a foundation for creating many different digital repository applications.
http://hyrax.samvera.org/
Apache License 2.0
184 stars 124 forks source link

Thumbnail rendered incorrectly with 3 mini thumbnails with different colour tints #4971

Closed paulwalk closed 2 years ago

paulwalk commented 3 years ago

Descriptive summary

This phenomenon is found on an instance of Hyrax v2.9

With works containing PDFs, the thumbnail of each work is supposed to be the first page of the PDF. In our instance, when the first page of the PDF is monochrome, we find that the first page is repeated three times on the top edge of the generated thumbnail image. Each of the three mini-thumbnails gets a different colour tint. The example thumbnail image below is from this published work:

This only seems to apply to situations where the first page of the PDF is monochrome. Works with colour have their thumbnails rendered correctly.

Rationale

Provide the rationale or user story that describes "why" this issue should be addressed. Especially if this is a new feature or significant change to the existing implementation.

Expected behavior

The thumbnail should be rendered as a single image of the first page of the PDF

Actual behavior

As described in the Descriptive summary above

Steps to reproduce the behavior

Publish a work containing a PDF where the first page of the PDF is monochrome.

Related work

n/a

no-reply commented 3 years ago

thanks for this! i appreciate the example work, which will hopefully help reproduce.

paulwalk commented 3 years ago

credit should go to @asahiko for providing the example

jlhardes commented 3 years ago

I can't reproduce this when uploading this PDF file on nurax-dev, which is Hyrax 3.0.2. The derived thumbnail is only showing the first page once, not 3 times.

lfarrell commented 3 years ago

We have the same problem with the 3 image pdf thumbnails. I tried a couple of our problem files on nurax and couldn't generate the error either. Does nurax use stock Hyrax imagemagick settings? I'm also wondering what version of imagemagick it uses?

If it's useful we're using ImageMagick 6.9.10-68 Q16 x86_64

cjcolvar commented 3 years ago

FWIW we're using ImageMagick 6.9.10-23 Q16 x86_64

When stepping through this in a console I see a warning which might help:

   **** Error: ICCbased space /N value does not match the ICC profile.
                 Using the number of channels from the profile.
                 Output may be incorrect.
cjcolvar commented 3 years ago

From what I can tell, the order of operations in hydra-derivatives for a pdf is:

xfrm = MiniMagick::Image.open(source_path)
xfrm = xfrm.layers[0]
xfrm.flatten
xfrm.resize('338x493')
xfrm.format('jpg')
xfrm.write(output)

When I run this in a console with a problem pdf I am able to reproduce this behavior including the error message above. This appears to be resolved if I change the order of the format and resize commands making the order be:

xfrm = MiniMagick::Image.open(source_path)
xfrm = xfrm.layers[0]
xfrm.flatten
xfrm.format('jpg')
xfrm.resize('338x493')
xfrm.write(output)

(FWIW if I put the format before the resize and the flatten then it behaves differently and doesn't look correct.)

jlhardes commented 3 years ago

Wouldn't this same problem show on nurax-dev if there is something out of order in the hydra-derivatives code that is causing this?

jlhardes commented 3 years ago

ImageMagick 7.0.7-39 Q16 x86_64 is in use for nurax-dev right now.

cjcolvar commented 3 years ago

Locally I installed ImageMagick 7.0.11-13 and used the same order of commands above and got the same results. Maybe it has to do with the version of ghostscript instead of ImageMagick? Or maybe it is safe to make this change in hydra-derivatives? I can make a PR and then we can test it thoroughly.

cjcolvar commented 3 years ago

@jlhardes https://github.com/samvera/hydra-derivatives/pull/227 was working for me when I manually tested commands. I haven't tested this in a real hyrax yet.

cjcolvar commented 3 years ago

Following up on this, I tested this as a monkey-patch in my local app and it works!

cjcolvar commented 3 years ago

I found a case where my proposed fix isn't working but I haven't been able to isolate why.

kerchner commented 3 years ago

I am also observing this with MS Word (.docx) files, with Hyrax 2.8.0 and Imagemagick version 6.9.10-23.

The patch above didn't fix the problem. The only effect it had was eliminating the black square below the 3 mini-thumbnails.

bess commented 3 years ago

When I was working at Data Curation Experts we encountered this problem on a Hyrax system we built for Emory University. The workaround for us was to build imagemagick with specific versions of image libraries, which were documented here: https://github.com/curationexperts/ansible-samvera/blob/main/roles/imagemagick/tasks/main.yml

jlhardes commented 3 years ago

Thank you @bess! Is that something you can try, @cjcolvar, for the case that still isn't working for you? I wonder if your patch in combo with these recommended versions will help here.

paulwalk commented 3 years ago

Thanks @bess! We will be discussing this tomorrow at our fortnightly development meeting.

rjkati commented 3 years ago

We are seeing this behavior at UNC-Chapel Hill.

Examples:

bad_thumbnail_2 bad_thumbnail_1

jlhardes commented 2 years ago

We need some follow-up to see if the hydra-derivatives patch along with specific versions of ImageMagick and/or ghostscript are the solution for this issue. It doesn't appear to be testable on nurax-dev.

orangewolf commented 2 years ago

I believe @bess fix might work is because an old version of Ghostscript being used. See https://bugs.ghostscript.com/show_bug.cgi?id=697223 which I think talks more specifically about what is happening here. Note that the timing lines up with the fact that GS 9.19 was released in early 2016 (see https://www.ghostscript.com/doc/9.55.0/History9.htm). I don't think the Ghostscript maintainers are going to change course here and I don't thinking using a version of Ghostscript from 2016 is a viable long term solution.

I think that @cjcolvar is probably fixing a common case (hydra-derivatives) when it comes to Samvera apps, but there are other tools that can create the same "broken" pdfs (specifically in my research the library img2pdf seems to). I can attest that Chris's fix solves the problem if you start with a non-broken PDF. If you start with a PDF that is grey scale with RGB color profile (aka broken in this way) then there appears to be no saving you... but that is certainly better than "all true greyscale PDFs" having this problem.

I would propose getting Chris's fix in to the mainline of Hyrax (and maybe backport to 2.x) simply because it fixes the majority case and non-ancient Ghostscript seems to be more and more common.

cjcolvar commented 2 years ago

@jlhardes I was able to fix the hydra-derivatives build and get my change in. Testing could start now or wait until a new release of hydra-derivatives is made. Github automatically closed this ticket when I merged my PR so feel free to reopen this issue.

jlhardes commented 2 years ago

Reopening to see if we can test and confirm this issue is fixed. Thanks @cjcolvar!

MPLSFedResearchTZ commented 2 years ago

New issue was observed that derivatives of PDF/PNG (probably include other image types) were not generated in works. Following discussion indicated that hydra-derivative 3.6.1 was deployed along with the new PR. Link to the work: https://nurax-dev.curationexperts.com/concern/generic_works/gm80hv60r?locale=en

rjkati commented 2 years ago

I am unable to replicate this issue on my local build of Hyrax 4.0.0.beta1. I deposited the three examples linked in the ticket and each thumbnail generated correctly: tripletest1 tripletest2 tripletest3

jlhardes commented 1 year ago

This is also fixed on nurax-dev. Yay!