samvera-deprecated / sufia

[DEPRECATED] Sufia: a fully featured, flexible Samvera repository front-end.
http://sufia.io/
Other
111 stars 78 forks source link

Place technical metadata on the File #1894

Closed awead closed 8 years ago

awead commented 8 years ago

Descriptive summary

Technical metadata, according to PCDM, should reside on the pcdm:File. Currently, Sufia puts technical metadata on the FileSet, which is a pcdm:Object. This work has been ticketed in Hydra::Works: https://github.com/projecthydra-labs/hydra-works/issues/290 Once that has been merged and released, we'll need to get CC working atop that change, and once that's done we should do the same in Sufia.

barmintor commented 8 years ago

This kind of gets into the issues around original_file, but pcdm:File doesn't contain anything, so... what to do? The extracted triples could/should go on the RDFSource that iana:describes the pcdm:File, but the extracted document is more like a derivative member of the FileSet, right?

awead commented 8 years ago

@barmintor not sure I follow. What do you mean by "extracted document" ? We're not saving the FITS xml anymore like we used to, so we can put the technical metadata on original_file.

jcoyne commented 8 years ago

@awead it's possible that we have the fits metadata before the file is ingested. Where does it go then?

awead commented 8 years ago

@jcoyne good question, and perhaps why we've implemented it this way?

grosscol commented 8 years ago

In the case where the FITS xml exists before the bitstream I think that would have to be handled by creating a pcdm:file with nil contents and assigning properties to that. Then when the actual bitstream becomes available, the content could be directly assigned. If the bitstream shows up as a second pcdm:file, the properties could be copied over to it and the first one removed.

The edge case above is when the bitstream never shows up. Then you have an empty pcdm:file with technical metadata that states otherwise. That can probably be caught by something like size > 0 && content.nil?

Making this change would mean that the FileSet needs to aggregate the technical metadata of the pcdm:files it contains. Specifically, size comes to mind as that is then aggregated by Works and Collections. Separable, but related issue.

mjgiarlo commented 8 years ago

Should this ticket be in hydra-works?

awead commented 8 years ago

@mjgiarlo dunno... does this happen in CurationConcerns too? @escowles?

escowles commented 8 years ago

Yes, I think this is a hydra-works ticket, because that's where the characterization code is that's putting the tech metadata on the FileSet is.

awead commented 8 years ago

I've opened a ticket in Hydra::Works. It's pretty clear from the spec test of Hydra::Works::CharacterizationService that it's going on the file set, but we may have to deal with the situation @jcoyne described above.

mjgiarlo commented 8 years ago

Thanks, @awead. Can we close this ticket now that we have one in HW?

awead commented 8 years ago

@mjgiarlo it's up to you. But, I think it's important that we communicate this issue out as Sufia now != PCDM.

mjgiarlo commented 8 years ago

K. I'll re-frame this ticket and keep it around. Thanks!

awead commented 8 years ago

Cool. Thanks! 👏

mjgiarlo commented 8 years ago

@awead Can you confirm that what's in Sufia master works as expected, with technical metadata expressed on the File's metadata node with the FileSet using a characterization proxy to access this information? If so, let's close this.

awead commented 8 years ago

@mjgiarlo do we want any kind of test? If so, how would you recommend doing that?

mjgiarlo commented 8 years ago

@awead Meh, I don't know if Sufia should test that. I'd be happy to hear you say, "CC and HW are doing what they're supposed to do and the test suite is green."

awead commented 8 years ago

Update: Running the test app in Sufia, CharacterizationJob is getting called and it looks like some characterization terms are getting set, such as hasSize, hasMimeType. However expected terms like height and width of images are not present. Also, Sufia's FileSet view reports "not yet characterized" so some presenter work probably needs to get done there.

jcoyne commented 8 years ago

@awead I'm seeing the same. I'm not seeing height and width recorded..

jcoyne commented 8 years ago

This appears to be a bug in ActiveFedora. Hydra::Works::CharacterizationService.run(original_file) updates the metadata in memory, and it's correctly marked as changed?, but metadata.save isn't persisting it.

jcoyne commented 8 years ago

AF bug: https://github.com/projecthydra/active_fedora/issues/1083

awead commented 8 years ago

Another release to HW and CC with a minor change should have this fixed.

awead commented 8 years ago

This is fixed now, although there really isn't a good way of integration-testing it. Not sure if that's worth pursuing, since it would make CI more of a mess, and probably not add that much value anyway.

We're currently testing it at the presenter level, which assumes the terms are in the solr document. Beyond that, it's reasonably well-tested at the CC, HW, and AF layers.

Thoughts, @mjgiarlo?

mjgiarlo commented 8 years ago

:shipit: