Closed awead closed 8 years ago
This kind of gets into the issues around original_file, but pcdm:File doesn't contain anything, so... what to do? The extracted triples could/should go on the RDFSource that iana:describes the pcdm:File, but the extracted document is more like a derivative member of the FileSet, right?
@barmintor not sure I follow. What do you mean by "extracted document" ? We're not saving the FITS xml anymore like we used to, so we can put the technical metadata on original_file.
@awead it's possible that we have the fits metadata before the file is ingested. Where does it go then?
@jcoyne good question, and perhaps why we've implemented it this way?
In the case where the FITS xml exists before the bitstream I think that would have to be handled by creating a pcdm:file with nil contents and assigning properties to that. Then when the actual bitstream becomes available, the content could be directly assigned. If the bitstream shows up as a second pcdm:file, the properties could be copied over to it and the first one removed.
The edge case above is when the bitstream never shows up. Then you have an empty pcdm:file with technical metadata that states otherwise. That can probably be caught by something like size > 0 && content.nil?
Making this change would mean that the FileSet needs to aggregate the technical metadata of the pcdm:files it contains. Specifically, size comes to mind as that is then aggregated by Works and Collections. Separable, but related issue.
Should this ticket be in hydra-works?
@mjgiarlo dunno... does this happen in CurationConcerns too? @escowles?
Yes, I think this is a hydra-works ticket, because that's where the characterization code is that's putting the tech metadata on the FileSet is.
I've opened a ticket in Hydra::Works. It's pretty clear from the spec test of Hydra::Works::CharacterizationService that it's going on the file set, but we may have to deal with the situation @jcoyne described above.
Thanks, @awead. Can we close this ticket now that we have one in HW?
@mjgiarlo it's up to you. But, I think it's important that we communicate this issue out as Sufia now != PCDM.
K. I'll re-frame this ticket and keep it around. Thanks!
Cool. Thanks! 👏
@awead Can you confirm that what's in Sufia master works as expected, with technical metadata expressed on the File's metadata node with the FileSet using a characterization proxy to access this information? If so, let's close this.
@mjgiarlo do we want any kind of test? If so, how would you recommend doing that?
@awead Meh, I don't know if Sufia should test that. I'd be happy to hear you say, "CC and HW are doing what they're supposed to do and the test suite is green."
Update: Running the test app in Sufia, CharacterizationJob is getting called and it looks like some characterization terms are getting set, such as hasSize, hasMimeType. However expected terms like height and width of images are not present. Also, Sufia's FileSet view reports "not yet characterized" so some presenter work probably needs to get done there.
@awead I'm seeing the same. I'm not seeing height and width recorded..
This appears to be a bug in ActiveFedora. Hydra::Works::CharacterizationService.run(original_file)
updates the metadata in memory, and it's correctly marked as changed?
, but metadata.save
isn't persisting it.
Another release to HW and CC with a minor change should have this fixed.
This is fixed now, although there really isn't a good way of integration-testing it. Not sure if that's worth pursuing, since it would make CI more of a mess, and probably not add that much value anyway.
We're currently testing it at the presenter level, which assumes the terms are in the solr document. Beyond that, it's reasonably well-tested at the CC, HW, and AF layers.
Thoughts, @mjgiarlo?
:shipit:
Descriptive summary
Technical metadata, according to PCDM, should reside on the pcdm:File. Currently, Sufia puts technical metadata on the FileSet, which is a pcdm:Object. This work has been ticketed in Hydra::Works: https://github.com/projecthydra-labs/hydra-works/issues/290 Once that has been merged and released, we'll need to get CC working atop that change, and once that's done we should do the same in Sufia.