samvera / hydra-works

A ruby gem implementation of the PCDM Works domain model based on the Samvera software stack
Other
24 stars 14 forks source link

Largest width / height should be used #327

Closed hackartisan closed 7 years ago

hackartisan commented 7 years ago

Currently if a multi-layer tiff is characterized, fits returns multiple imageHeight and imageWidth entries with status="CONFLICT", e.g.

      <imageWidth toolname="Jhove" toolversion="1.5" status="CONFLICT">2226</imageWidth>
      <imageWidth toolname="Tika" toolversion="1.3" status="CONFLICT">160</imageWidth>
      <imageHeight toolname="Jhove" toolversion="1.5" status="CONFLICT">1650</imageHeight>
      <imageHeight toolname="Tika" toolversion="1.3" status="CONFLICT">119</imageHeight>

Current behavior is to just save both on the model.

 :width=>["2226", "160"],
 :height=>["1650", "119"],

Downstream apps don't appear to be accounting for this, which is their problem, sure. e.g. Sufia's file page reports my object as having dimensions:

Height: 1650
Width: 160

But does it make any sense to have multiple heights / multiple widths, especially when they aren't paired appropriately? My preference would be to choose the largest of each before storing on the model https://github.com/samvera/hydra-works/blob/master/lib/hydra/works/services/characterization_service.rb#L31. Once it's on the model it's accessible in our apps as linked data and it's confusing.

hackartisan commented 7 years ago

@dchandekstark, @jpstroop suggested you might have pyramid tiffs (http://iipimage.sourceforge.net/documentation/images/) and that might be relevant to this discussion. If you know how fits handles these that would be great to know. This issue will be discussed on today's tech call if you can make it.

dchandekstark commented 7 years ago

@hackmastera We do have pyramid TIFFs that are generated from master images, but we don't characterize them with FITS.

hackartisan commented 7 years ago

@dchandekstark any idea whether this proposal would affect you?

dchandekstark commented 7 years ago

@hackmastera I don't understand the significance of the problem. We are just getting started with Hyrax, but I can check with @coblej and others.

coblej commented 7 years ago

@hackmastera The role that pyramid TIFFs currently play in our applications (display derivatives) is such that we don't characterize them. As @dchandekstark notes, we are just beginning our investigation of Hyrax and its dependencies (like hydra-works), though that may not change the role that pyramid TIFFs play. So, at the present time, the proposal would not affect us and probably won't in the future, except to the extent that it seems wise to store whatever data is stored in a clear and unambiguous fashion.

hackartisan commented 7 years ago

Another option: turn off tika??

hackartisan commented 7 years ago

@cam156 mentions a similar special case was made for mime type, to take the first. will look around for that.

hackartisan commented 7 years ago

tech call yielded general agreement to this approach; will confirm with tech list.

carolyncole commented 7 years ago

@hackmastera I think this is it: https://github.com/samvera/hydra-works/blob/f948eb0fa59221fef23a9564d6c07153015c5bea/lib/hydra/works/services/characterization_service.rb#L104

It is quite obscure, but I think it is taking the last value not the first now.

hackartisan commented 7 years ago

@cam156 oh, that's great it's exactly where I was making my change. Thank you for finding it! I actually did notice that special case but you're right it was difficult to decipher the intent; I see now that it has the effect you stated. I'll add a comment when I PR for this issue.