briesenberg07 commented 1 year ago

A starting point from https://github.com/uwlib-cams/uwlswd/discussions/42

(2) How do we deprecate https://doi.org/10.6069/uwlib.55.a#uwSemWeb and https://doi.org/10.6069/uwlib.55.a ? I suspect we just leave it in place and strip-out false or redundant triples. Then those resources would not be incorrect (unless we add to the "partitions" in ways that contradict the assertions in uwlib.55.a and uwlib.55.a#uwSemWeb), so who are they hurting? Maybe us, as we'd have to manage the DOI, but I bet we can live with that.

gerontakos commented 1 year ago

Leaving things in place may be fine (we can analyze the remaining triples) but we may want to add a triple (what does RDA Registry do? Anything more than add "(deprecated)" to the labels? (I can't look, RDA Registry seems to have gone bonkers today for accessing the RDF; I'm probably doing something incorrect). We'll probably need a new property for this. With values from a Deprecation Vocabulary!

briesenberg07 commented 1 year ago

My questions include:

If we move statements from the data at uwlib.55.a into various datasets (uwlib.55.a.3.1, uwlib.55.a.3.2, etc.) is there any reason to keep those statements in uwlib.55.a? I hope not—I don’t want to maintain the data in two places! And I don’t think so.
I think we want to add a triple to uwlib.55.a (and uwlib.55.a#uwSemWeb?) to indicate that the resource(s) is/are deprecated
- owl:deprecated seems the logical choice, any reason not to use?
- Add triple on both DOI without and with hash? Just to DOI without hash (void:DatasetDescription)?
- Anything else to know about deprecating a semantic web resource in general, or owl:deprecated in particular?
(A little outside scope of this issue but related) Consider proposal regarding changing rdf:type = void:Dataset to dct:type = dcmitype:Dataset for "Instances of [...]" datasets
Confirm that usage of VoID props/classes following deprecation and changing triples in each of the uwlswd_datasets would align with ontology requirements – related to above proposed change from void:Dataset to dcmitype:Dataset typing – does use of other VoID properties depend on resources being typed as void:Dataset?

briesenberg07 commented 10 months ago

📢 Proposing stopgap measures to improve discovery for UWLSWD index

Context

We refer to all of our published datasets and vocabularies as 'University of Washington Libraries Semantic Web Data'. This is the title of our UWLSWD index page, and following pull request uwlib-cams/uwlswd#77 , also the schema:title for the published document, which will hopefully put the index in search results for that phrase eventually (?).

BUT currently, first-page search results for 'University of Washington Libraries Semantic Web Data' include the VoID dataset description for our "Instances of (...)" datasets, and the UWLSWD index not at all. This is likely due to the fact that the VoID description has (excerpted):

<html>
   <head>
      <script type="application/ld+json">
      {
      (...)
      "name" : "VoID Description of the dataset 'University of Washington Libraries' Semantic Web Data'" ,
      "description" : "University of Washington Libraries' Semantic Web Data" ,
      (...)
      </script>
   <body>
      <h1>University of Washington Libraries' Semantic Web Data</h1>
      (...)
   </body>
</body>
</html>

Proposal

Draft new title for use as:
- dct:title
- datacite:title
- schema:name
Draft new description for use as:
- dct:description
- schema:description
- datacite:description
Implement new title and description ~~following review~~
Check for and make other needed updates per the cleanup checklist as possible
~~Increment version as needed (I believe REVISION)~~ VoID description not versioned
Run main.py, republish

@cspayne @gerontakos does this sound like a reasonable plan?

briesenberg07 commented 10 months ago

Working in branch VoID_rename, attempting to implement new title and description as proposed above, but datacite_metadata.py and main.py are incompatible with the VoID resource, I think because the VoID resource contains multiple 'top-level resources' (DOIs with no hash identifiers). I think we knew that this would happen--but I forgot!

Thus, updates to VoID resource and serializations will have to be made in a more labor-intensive fashion. Error details below just in case, although I don't think we plan to change main.py so that it can work with the format of this resource.

VoID file + uwlswd/py/datacite_metadata.py

====================
Generating DataCite metadata file from ../uwlswd_datasets/void/void_description_of_the_dataset_university_of_washington_libraries_semantic_web_data.rdf
====================
Type error at char 28 in expression in xsl:result-document/@href on line 12 column 47 of rdf2datacite.xsl:
  XPTY0004  A sequence of more than one item is not allowed as the first argument of
  fn:substring-after() ("https://doi.org/10.6069/uwlib.55.a",
  "https://doi.org/10.6069/uwlib.55.a.3.6")
  In template rule with match="/" on line 8 of rdf2datacite.xsl
A sequence of more than one item is not allowed as the first argument of fn:substring-after() ("https://doi.org/10.6069/uwlib.55.a", "https://doi.org/10.6069/uwlib.55.a.3.6")

VoID file + main.py

====================
PROCESSING void_description_of_the_dataset_university_of_washington_libraries_semantic_web_data
====================
void_description_of_the_dataset_university_of_washington_libraries_semantic_web_data.rdf generated
void_description_of_the_dataset_university_of_washington_libraries_semantic_web_data.nt generated
void_description_of_the_dataset_university_of_washington_libraries_semantic_web_data.ttl generated
void_description_of_the_dataset_university_of_washington_libraries_semantic_web_data.jsonld generated

generating HTML+RDFa with Schema.org data
WARNING: schema:version missing from rdf/xml
Type error at char 8 in expression in xsl:value-of/@select on line 216 column 11 of rdf2schemaorg.xsl:
  XPTY0004  A sequence of more than one item is not allowed as the first argument of
  fn:replace() ("https://www.lib.wash ... ceResource-1-0-0.ttl", "https://www.lib.wash ...
  ggregation-1-0-0.ttl")
at template rdf2schemaorg on line 11 column 40 of rdf2schemaorg.xsl:
     invoked by xsl:call-template at file:/C:/Users/Benjamin/od/uwlswd/uwlswd/xsl/rdf2htmlrdfa.xsl#67
  In template rule with match="/" on line 39 of rdf2htmlrdfa.xsl
A sequence of more than one item is not allowed as the first argument of fn:replace() ("https://www.lib.wash ... ceResource-1-0-0.ttl", "https://www.lib.wash ... ggregation-1-0-0.ttl")
void_description_of_the_dataset_university_of_washington_libraries_semantic_web_data.html generated
Traceback (most recent call last):
  File "C:\Users\Benjamin\od\uwlswd\uwlswd\py\main.py", line 139, in <module>
    process_file(file_path, fancy)
  File "C:\Users\Benjamin\od\uwlswd\uwlswd\py\main.py", line 80, in process_file
    fancify_HTML(output_file)
  File "C:\Users\Benjamin\od\uwlswd\uwlswd\py\fancyhtml.py", line 13, in fancify_HTML
    tree = ET.parse(filepath)
           ^^^^^^^^^^^^^^^^^^
  File "src\lxml\etree.pyx", line 3541, in lxml.etree.parse
  File "src\lxml\parser.pxi", line 1879, in lxml.etree._parseDocument
  File "src\lxml\parser.pxi", line 1905, in lxml.etree._parseDocumentFromURL
  File "src\lxml\parser.pxi", line 1808, in lxml.etree._parseDocFromFile
  File "src\lxml\parser.pxi", line 1180, in lxml.etree._BaseParser._parseDocFromFile
  File "src\lxml\parser.pxi", line 618, in lxml.etree._ParserContext._handleParseResultDoc
  File "src\lxml\parser.pxi", line 728, in lxml.etree._handleParseResult
  File "src\lxml\parser.pxi", line 657, in lxml.etree._raiseParseError
  File "../uwlswd_datasets/void/void_description_of_the_dataset_university_of_washington_libraries_semantic_web_data.html", line 35
lxml.etree.XMLSyntaxError: Premature end of data in tag script line 6, line 35, column 11

briesenberg07 commented 10 months ago

📢 starting over

disregard commits to deleted branch above for the sake of easier-to-compare diffs for these manual changes

to-do list

for short term title and description fix

[x] Reformat HTML (oXygen format + indent) + RDF/XML (rdflib serialization)
[x] Make/check manual edits to HTML
- [x] new schema markup
  - [x] new schema:name
  - [x] correct encoding format text/html
- [x] Add missing html/head/link resources
- [x] Fix URLs for html/head/link resources
- [x] Fix title as it appears in top-of-page material
- [x] Minor reformatting of top-of-page material, eliminating extra line breaks in text
- [x] dct:title for DOI, DOI#fragment
- [x] Replace links to data in washington.edu/static with links to github.io
- [x] Fix resource title in footer
- [x] Fix link 'JSON' > 'JSON-LD'
- [x] (BONUS: Delete owl:version triples from RDF/XML + rows from HTML)
- [x] commit - see 1dd0637
[x] Make/check manual edits to RDF/XML
- [x] dct:title for DOI, DOI#fragment
- [x] (BONUS: Delete owl:version triples from RDF/XML + rows from RDF/XML)
- [x] Replace links to data in washington.edu/static with links to github.io
- [x] Add JSON-LD to each dct:hasFormat set
- [x] Check for stray en-US lang tags
- [x] commit - see fe9092a
[x] RENAME RDF/XML + HTML+RDFa files per UWLSWD naming conventions
[x] Serialize
[x] Confirm updated (minimal) DataCite metadata OK - see uwlswd commit a639b1f, uwlswd commit ccb1512
[x] Update uwlswd/index.html; consider removing/reordering VoID description
- [x] Confirm URL for uwlswd/index.html
- [x] Commit - see uwlswd 645eb72
[x] Make pull request
[x] Merge pull request / update DataCite XML and URL

briesenberg07 commented 10 months ago

STOPGAP MEASURES COMPLETE ✅

uwlib-cams / uwlswd_datasets

Deprecate dataset description <https://doi.org/10.6069/uwlib.55.a>, dataset <https://doi.org/10.6069/uwlib.55.a#uwSemWeb> #22

📢 Proposing stopgap measures to improve discovery for UWLSWD index

Context

Proposal

VoID file + uwlswd/py/datacite_metadata.py

VoID file + main.py

📢 starting over

to-do list