Closed arwenhutt closed 4 years ago
@mcritchlow passing this over to you for review eta: I guess actually including the link to the draft spec might be good ;) https://docs.google.com/document/d/17e921jvvjhFXJA_y1VaDyi9Qmsg_ZNYv4UFSyxACQDM/edit
@arwenhutt - At a first reading, here are some initial thoughts. And I'll preface this by saying @lsitu has far more experience with existing code in this context, so I happily defer to his expertise if I'm off base on anything here.
This all seems reasonable to me. I would perhaps make it explicit whether you want all the columns included in the export regardless of whether there are values or not. I assume yes to facilitate editing, but it might be good to be specific.
Obviously we should agree on, and lock down, the export format. It would be really nice if we didn't need to do xlsx
since we're binding ourselves to a proprietary format, but at the same time if that's the most expedient approach for now (sounds like it is) perhaps we can table that issue for the future.
Again this mostly seems reasonable. But there is an extension of editing here that's inclusive of manipulation of files, in addition to metadata. I can see why that's desirable, however, given the significance of the changes involved and the implications for 'getting it wrong', I would recommend that we chunk this up into granular deliverables that are fully functional parts of the spec. The idea being that fully completing and testing Deliverable 1 below (for example) would provide immediate value despite the entire spec not being implemented. Here's just an example of how we might organize this work into Sprints based on priority/delivered value:
Deliverables:
@mcritchlow Thanks! That's very helpful. I've revised the Batch Edit Spec draft integrating the approach you suggest if you and @lsitu could take a look.
@mcritchlow about the format for the export: My main concern is that the import, export, and overlay tools utilize the same format, less about what the actual format is. This seems most important for the export and overlay tool, since we'll be using the file produced by the export in the overlay, but there also seems to be a lot of overlap between the overlay functions, as well as validation, with our current ingest tool.
But I'm also assuming xlsx as the common format for our tools is likely to change in the (hopefully) not too distant future - as surfliner moves forward, and we are able to use shared code for more components of the infrastructure, particularly import, metadata testing/validation, etc.
@arwenhutt I think we'll refactor damsmanager's ingest function a little bit to see what we can be reused for the batch edit function.
Could you clarify what column headers need to handle in the batch metadata overlay tool as well? I recalled that @GregReser proposes to append add:
and delete:
to the headers in Batch Export output for add/delete actions.
@lsitu all of the columns from the batch export. The current approach doesn't use add:
or delete:
- this approach uses a similar approach to the single item rdf edit tool, and just overlays the fields in the spreadsheet.
@arwenhutt Do you mean replace the fields that show up in the spreadsheet? For example, if field Note:note
shows up in the spreadsheet, then all the Note:note
elements in the object should be replaced with the Note:note
values from the spreadsheet.
That's what we were thinking. Theoretically it seems to hold up if the export contains all of the existing values. But let me know if you see potential issues with this approach!
@arwenhutt This approach is similar to the Same Predicates Replacement
tool in DAMS Manager, which is used to replace values for top level predicates. In this case, we simply remove those fields with the same name and then add in the new values for those fields
I think we can rebuild the RDF/XML for each object basing on the fields presented in the spreadsheet to make it work.
@arwenhutt It seem like we may need to clarify how to deal with the export headers that could be part of a compound field like Title
, which includes the following elements:
Title
Subtitle
Part name
Part number
Translation
Variant
Will all the elements above need to be presented if we only need to update just one element like Subtitle
?
@lsitu since we won't know which fields will need editing for different tasks, we have to include all the possibilities. These are outlined in the batch export spec: "Properties/values to export"
And would include all of the title
elements.
@lsitu let me know if the chunking and ordering of the overlay spec functionality looks good to you, or if there are chunks that would be better put together or split up more. Once we have that settled, I'll review the rest of the doc (validation checking and reporting) and make any updates, and we can hopefully finalize the spec and I can make tickets for the parts of work. Thanks!
@arwenhutt I think it's doable and we just need to rebuild the whole RDF/XML basing on the headers provided. Will you just provide the fields that need edit in the spreadsheet, or include all fields from batch export for Batch Edit?
@arwenhutt It looks like there are a couple of special cases that we may need to take into account if Batch Edit won't include all the fields exported from Batch Export:
Corporate:Owner
to Person:Owner
.@lsitu I need help understanding why your two special cases are a problem.
If the whole RDF/XML is rebuilt based on the headers provided then the existing Corporate:Owner
would just be deleted with everything else and a new Person:Owner
would be created.
Same for deleting a single field. If that single field is not included in the overlay spreadsheet, it won't be created in the new RDF/XML.
@GregReser How to determine a filed like Corporate:Owner
need to be deleted or a few like Person:Owner
is to add?
@lsitu Corporate:Owner
would be deleted automatically because all descriptive properties would be deleted at the beginning of the process. Person:Owner
would be added because it is in the edit spreadsheet and all properties in the spreadsheet would be added.
If it is possible, we were thinking that the edit process would work like this:
Delete all existing descriptive properties from the object (all properties specified in the batch export spec This is what the RDF would look like at step 1: batchEditStep1.txt
Add all properties in the edit spreadsheet. If the spreadsheet does not contain a property (column does not exist or that cell in the column is null, then that property is not created in the edit process. This means the edit spreadsheet will be very full, containing all properties we want in the finished object.
@GregReser I think that will work. Thanks for you clarification.
Also, for adding files, could we clarify that we just add files to existing components? We may not be able to just any new components to hold the new files since we may want to reorder/restructure existing components and re-ingest existing files in the object to match their ark filenames in the filestore. But if we agree on just add files to the next available component index (for example, if the last component index is /10
, then the file is added to component /11
) without restructuring existing components, then it will be fine.
@lsitu Yes, I was afraid inserting new components would cause a complication. We can leave that function for the future and just handle editing existing components for now. How about if we limit the scope of this to deliverables 1 - 4 on Metadata batch edit / overlay
@GregReser It sounds good to start with 1 - 4 (Metadata batch edit / overlay). Thanks.
@lsitu & @GregReser Thanks for keeping this discussion going! It sounds like we have agreement on the first four items, ~so I'll go ahead and create tickets for them.~
@lsitu would it help to have tickets for the four deliverables @mcritchlow outlined and discussed in the document? or is there another way to divide the work that would be better?
Descriptive summary
Specification for batch metadata overlay functionality
Acceptance criteria
Part of Batch edit functionality (epic) https://github.com/ucsdlib/damsmanager/issues/309