Closed mcritchlow closed 5 years ago
@mcritchlow It seems like there are a couple of bigger issues that we may need to inspect and discuss before moving forward with a solution:
What work_obj.datastreams
should behave? Should it load any datastream contents or is it a hydra bug?
I see damsrepo return the correct result for all datascreams in the test object bd24789869:
<objectDatastreams xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:dams="http://library.ucsd.edu/ontology/dams#" xmlns="http://www.fedora.info/definitions/1/0/access/" pid="bd24789869" baseURL="http://lib-hydratail-staging.ucsd.edu:8080/dams/fedora/" xsi:schemaLocation="http://www.fedora.info/definitions/1/0/access/ http://www.fedora-commons.org/definitions/1/0/listDatastreams.xsd">
<datastream dsid="DC" label="Dublin Core Record for this object" mimeType="text/xml"/>
<datastream dsid="RELS-EXT" label="Fedora Object-to-Object Relationship Metadata" mimeType="application/rdf+xml"/>
<datastream dsid="rightsMetadata" label="" mimeType="text/xml"/>
<datastream dsid="damsMetadata" label="DAMS RDF metadata" mimeType="application/rdf+xml"/>
<datastream dsid="_rdf.xml" label="DAMS RDF serialized metadata" mimeType="application/rdf+xml"/>
<datastream dsid="_1.mp4" label="" mimeType="video/mp4"/>
<datastream dsid="_2.mp4" label="" mimeType="video/mp4"/>
<datastream dsid="_3.jpg" label="" mimeType="image/jpeg"/>
<datastream dsid="_4.jpg" label="" mimeType="image/jpeg"/>
</objectDatastreams>
What should method def content_changed?
do?
Does it necessary to compare the content since the datastream profile contains the size and checksum that should be good enough to determine any changes to the content file? Will the content file be empty with local_or_remote_content(false)
that may cause other issues?
How is the block if controlGroup == "X"
triggered the error since all datastream profile should be the X
dsControlGroup
, which seems to be correct?
@lsitu - So as for (1), I was just using .datastreams
as an example. The issue really manifests itself in the VRR work when calling .save
on an object. As far as I can tell reading through that method, it's trying to identify if changes were made and whether it actually needs to save them.
Here's where this is getting triggered https://github.com/samvera/rubydora/blob/v1.7.5/lib/rubydora/digital_object.rb#L284 which calls content_changed?
As for (2), it is hard to sort through. I think you're on to the real question, which to me is what are the potential side-effects of having file datastreams have a non-X control group. It seems like it would get us around this particular issue. But might cause others. Either way it's definitely wrong though.. The non-XML datastreams shouldn't be marked as XML.
I think for our purposes it seems like we want content_changed?
to return false
ALWAYS for any file datastreams. At least that's my initial impression, since we won't be adding an editor to actually manipulate file datastreams in the hydra app (at least i hope not.. :) ). It seems that if mark the file datastreams with the E
control group, that would do the trick. I'm curious what you think.
How is the block if controlGroup == "X" triggered the error since all datastream profile should be the X dsControlGroup, which seems to be correct?
I'm not sure I follow this part. I think the error is triggered specifically because every datastream has control group set to 'X'. But I may be misunderstanding you on this. I think this ties in to your question about whether this is a 'hydra bug'. I think in the way you framed it, I don't think so. I think the hydra code is trying to be 'smart' about only saving datastreams that have actually changed, and we're running into an issue where our repo is correctly returning all datastreams, but they're marked in a way that's perhaps not consistent with the Fedora3 spec? I'm not sure that's the best way to put it, but it seems like by marking all our object datastreams as 'X' control group, we're not following proper conventions/expectations that ActiveFedora has.
It looks like if E
or R
is used as the control group, then the dsLocation
must be set to the correct URL.
https://wiki.duraspace.org/display/FEDORA38/REST+API#RESTAPI-addDatastream
And it seems that we're doing that for the profile already https://github.com/ucsdlib/damsrepo/blob/develop/src/xsl/fedora-datastream-profile.xsl#L29
It does make me wonder why we fixed controlGroup in the first place to X, but that's probably too many years ago to sort out..
@mcritchlow Yes, we could change it if that is the problem. But method .datastreams
return the datastream content is the expected behavior? And if the method content_changed?
just load the profile for the datastreams to comparing the change, then X
is good to catch the data profile as RDF/XML to compare the content change in the file. If we need to load the datastream content, will the overhead be affordable for a huge tar ball?
That's a good question, but my impression is that it's already loading the datastream content. When my terminal 'crashed' while doing initial debugging, it was clear that it was pulling down binary content. I'm wary of saying yes to your question of expected behavior, but we're dealing with such old versions of ActiveFedora and Rubydora that... who knows. I suspect the real, possibly only, answer to your question is we'd need to test.
@mcritchlow Yes, then I think we can start from change the dsControlGroup
to see how it goes.
Or is it possible to update the method content_changed?
to just compare the datastream profile for content file changes, which is in RDF/XML format that won't trigger the error?
Hmm. I mean we could fork the gem and update that method. But I'd be nervous about taking on that responsibility. I suppose it's not obviously more risk than updating the dsControlGroup
in all our file datastreams, but at least we own that. I don't know, what do you lean towards?
@mcritchlow I've added PR https://github.com/ucsdlib/damsrepo/pull/86 for it. Let's test it to see how it goes for now. Thanks.
Yeah I think this looks like a great test. Thanks @lsitu !
This issue came up with the Notch8 VRR work.
Essentially, the value of the Fedora datastream
dsControlGroup
for a given datastream is hard-coded in ourfedora-datastream-profile
asX
which means XML/inline.There is perhaps better documentation about the types of datastreams, but they're at least partly described here: https://wiki.duraspace.org/display/FEDORA38/REST+API#RESTAPI-addDatastream
We should:
[ ] Consider the implications of making ANY change here. It would presumably require a full update/reindex?
[ ] Consider making the
dsControlGroup
value dynamic depending on the situation.X
for metadata and eitherR
orM
for files? I'm not clear at the moment which is 'correct'[ ] Document and implement the agreed upon change, confirming it works in staging without causing other side effects.
This is causing the Rubydora code that is trying to check if an object's datastreams have changed to fail, because it's trying to compare XML datastreams for things like files, which it should not be doing.