uwlib-cams / uwlswd_vocabs_marc_006_008

https://uwlib-cams.github.io/uwlswd/
Creative Commons Zero v1.0 Universal
0 stars 0 forks source link

finish main.py workflow for two new 'nature of contents' concept schemes #26

Closed briesenberg07 closed 2 months ago

briesenberg07 commented 3 months ago

@dkreisstomkins the error with main.py was caused by incorrect data in the RDF/XML for new concept scheme 'books: nature of contents' - see details of changes which resolved the error.

My best understanding without revisiting main.py code or stylesheets is that the incorrect dct:hasFormat values in the RDF/XML -- pointing to a directory and files which don't exist, caused the main.py transformation error. In any case making the change (link to commit just above) resolved the error and allowed me to run main.py successfully for books_nature_of_contents/.

I believe that if you make similar edits to continuing_nature_of_contents/continuing_nature_of_contents.rdf (see L97-100), you'll be able to run main.py successfully and then we can make the pull request.

My understanding, to confirm with @cspayne if possible:

main.py works ✔️ in both of the following scenarios:

main.py does not work ❌ in the following scenario:

action items

briesenberg07 commented 3 months ago

@dkreisstomkins the other thing is to make sure you know that I've made two commits to the divide_somes branch and you need to pull before you tackle the action items above (I'm sure you would've anyway but wanted to be extra sure!) TY!

cspayne commented 3 months ago

main.py works ✔️ in both of the following scenarios:

  • A new resource (RDF/XML serialization) is processed using main.py and includes no dct:hasFormat triples

  • An updated resource (RDF/XML serialization) is processed using main.py and includes accurate dct:hasFormat triples (filepath/filenames point to the other serializations of the resource, which will be overwritten by main.py processing to reflect updates in RDF/XML)

main.py does not work ❌ in the following scenario:

  • An updated resource (RDF/XML serialization) is processed using main.py and includes incorrect dct:hasFormat triples (filepath/filenames do not point to the current location of other serializations of the resource)

That's correct! I never implemented a function to delete incorrect dct:hasFormat triples...do we want one? It would be fairly simple to do.

briesenberg07 commented 2 months ago

Thank you @cspayne !

a function to delete incorrect dct:hasFormat triples

Would this compare the file path input from the terminal with the filepath in dct:hasFormat triples to remove/replace incorrect dct:hasFormat triples?

cspayne commented 2 months ago

@briesenberg07

Would this compare the file path input from the terminal with the filepath in dct:hasFormat triples to remove/replace incorrect dct:hasFormat triples?

It would likely be the simplest to delete all dct:hasFormat triples before the correct ones are added, which is easy to do with rdflib.

briesenberg07 commented 2 months ago

We are deciding not to make any changes to main.py/serialize.py at this time. So the important things to remember when processing RDF/XML with main.py are:

📢 Cannot pass incorrect dct:hasFormat paths to main.py!!
YOU CAN: