uwlib-cams / uwlswd_datasets

University of Washington Libraries digital-collections and other metadata, published as linked open data
https://uwlib-cams.github.io/uwlswd/
Creative Commons Zero v1.0 Universal
0 stars 1 forks source link

Deprecate dataset description <https://doi.org/10.6069/uwlib.55.a>, dataset <https://doi.org/10.6069/uwlib.55.a#uwSemWeb> #22

Closed briesenberg07 closed 3 months ago

briesenberg07 commented 1 year ago

A starting point from https://github.com/uwlib-cams/uwlswd/discussions/42

(2) How do we deprecate https://doi.org/10.6069/uwlib.55.a#uwSemWeb and https://doi.org/10.6069/uwlib.55.a ? I suspect we just leave it in place and strip-out false or redundant triples. Then those resources would not be incorrect (unless we add to the "partitions" in ways that contradict the assertions in uwlib.55.a and uwlib.55.a#uwSemWeb), so who are they hurting? Maybe us, as we'd have to manage the DOI, but I bet we can live with that.

gerontakos commented 1 year ago

Leaving things in place may be fine (we can analyze the remaining triples) but we may want to add a triple (what does RDA Registry do? Anything more than add "(deprecated)" to the labels? (I can't look, RDA Registry seems to have gone bonkers today for accessing the RDF; I'm probably doing something incorrect). We'll probably need a new property for this. With values from a Deprecation Vocabulary!

briesenberg07 commented 1 year ago

My questions include:

briesenberg07 commented 10 months ago

📢 Proposing stopgap measures to improve discovery for UWLSWD index

Context

We refer to all of our published datasets and vocabularies as 'University of Washington Libraries Semantic Web Data'. This is the title of our UWLSWD index page, and following pull request uwlib-cams/uwlswd#77 , also the schema:title for the published document, which will hopefully put the index in search results for that phrase eventually (?).

BUT currently, first-page search results for 'University of Washington Libraries Semantic Web Data' include the VoID dataset description for our "Instances of (...)" datasets, and the UWLSWD index not at all. This is likely due to the fact that the VoID description has (excerpted):

<html>
   <head>
      <script type="application/ld+json">
      {
      (...)
      "name" : "VoID Description of the dataset 'University of Washington Libraries' Semantic Web Data'" ,
      "description" : "University of Washington Libraries' Semantic Web Data" ,
      (...)
      </script>
   <body>
      <h1>University of Washington Libraries' Semantic Web Data</h1>
      (...)
   </body>
</body>
</html>

Proposal

  1. Draft new title for use as:
    • dct:title
    • datacite:title
    • schema:name
  2. Draft new description for use as:
    • dct:description
    • schema:description
    • datacite:description
  3. Implement new title and description following review
  4. Check for and make other needed updates per the cleanup checklist as possible
  5. Increment version as needed (I believe REVISION) VoID description not versioned
  6. Run main.py, republish

@cspayne @gerontakos does this sound like a reasonable plan?

briesenberg07 commented 10 months ago

Working in branch VoID_rename, attempting to implement new title and description as proposed above, but datacite_metadata.py and main.py are incompatible with the VoID resource, I think because the VoID resource contains multiple 'top-level resources' (DOIs with no hash identifiers). I think we knew that this would happen--but I forgot!

Thus, updates to VoID resource and serializations will have to be made in a more labor-intensive fashion. Error details below just in case, although I don't think we plan to change main.py so that it can work with the format of this resource.

VoID file + uwlswd/py/datacite_metadata.py

====================
Generating DataCite metadata file from ../uwlswd_datasets/void/void_description_of_the_dataset_university_of_washington_libraries_semantic_web_data.rdf
====================
Type error at char 28 in expression in xsl:result-document/@href on line 12 column 47 of rdf2datacite.xsl:
  XPTY0004  A sequence of more than one item is not allowed as the first argument of
  fn:substring-after() ("https://doi.org/10.6069/uwlib.55.a",
  "https://doi.org/10.6069/uwlib.55.a.3.6")
  In template rule with match="/" on line 8 of rdf2datacite.xsl
A sequence of more than one item is not allowed as the first argument of fn:substring-after() ("https://doi.org/10.6069/uwlib.55.a", "https://doi.org/10.6069/uwlib.55.a.3.6")

VoID file + main.py

====================
PROCESSING void_description_of_the_dataset_university_of_washington_libraries_semantic_web_data
====================
void_description_of_the_dataset_university_of_washington_libraries_semantic_web_data.rdf generated
void_description_of_the_dataset_university_of_washington_libraries_semantic_web_data.nt generated
void_description_of_the_dataset_university_of_washington_libraries_semantic_web_data.ttl generated
void_description_of_the_dataset_university_of_washington_libraries_semantic_web_data.jsonld generated

generating HTML+RDFa with Schema.org data
WARNING: schema:version missing from rdf/xml
Type error at char 8 in expression in xsl:value-of/@select on line 216 column 11 of rdf2schemaorg.xsl:
  XPTY0004  A sequence of more than one item is not allowed as the first argument of
  fn:replace() ("https://www.lib.wash ... ceResource-1-0-0.ttl", "https://www.lib.wash ...
  ggregation-1-0-0.ttl")
at template rdf2schemaorg on line 11 column 40 of rdf2schemaorg.xsl:
     invoked by xsl:call-template at file:/C:/Users/Benjamin/od/uwlswd/uwlswd/xsl/rdf2htmlrdfa.xsl#67
  In template rule with match="/" on line 39 of rdf2htmlrdfa.xsl
A sequence of more than one item is not allowed as the first argument of fn:replace() ("https://www.lib.wash ... ceResource-1-0-0.ttl", "https://www.lib.wash ... ggregation-1-0-0.ttl")
void_description_of_the_dataset_university_of_washington_libraries_semantic_web_data.html generated
Traceback (most recent call last):
  File "C:\Users\Benjamin\od\uwlswd\uwlswd\py\main.py", line 139, in <module>
    process_file(file_path, fancy)
  File "C:\Users\Benjamin\od\uwlswd\uwlswd\py\main.py", line 80, in process_file
    fancify_HTML(output_file)
  File "C:\Users\Benjamin\od\uwlswd\uwlswd\py\fancyhtml.py", line 13, in fancify_HTML
    tree = ET.parse(filepath)
           ^^^^^^^^^^^^^^^^^^
  File "src\lxml\etree.pyx", line 3541, in lxml.etree.parse
  File "src\lxml\parser.pxi", line 1879, in lxml.etree._parseDocument
  File "src\lxml\parser.pxi", line 1905, in lxml.etree._parseDocumentFromURL
  File "src\lxml\parser.pxi", line 1808, in lxml.etree._parseDocFromFile
  File "src\lxml\parser.pxi", line 1180, in lxml.etree._BaseParser._parseDocFromFile
  File "src\lxml\parser.pxi", line 618, in lxml.etree._ParserContext._handleParseResultDoc
  File "src\lxml\parser.pxi", line 728, in lxml.etree._handleParseResult
  File "src\lxml\parser.pxi", line 657, in lxml.etree._raiseParseError
  File "../uwlswd_datasets/void/void_description_of_the_dataset_university_of_washington_libraries_semantic_web_data.html", line 35
lxml.etree.XMLSyntaxError: Premature end of data in tag script line 6, line 35, column 11
briesenberg07 commented 10 months ago

📢 starting over

disregard commits to deleted branch above for the sake of easier-to-compare diffs for these manual changes

to-do list

for short term title and description fix

briesenberg07 commented 10 months ago

STOPGAP MEASURES COMPLETE