Closed briesenberg07 closed 3 months ago
Leaving things in place may be fine (we can analyze the remaining triples) but we may want to add a triple (what does RDA Registry do? Anything more than add "(deprecated)" to the labels? (I can't look, RDA Registry seems to have gone bonkers today for accessing the RDF; I'm probably doing something incorrect). We'll probably need a new property for this. With values from a Deprecation Vocabulary!
My questions include:
We refer to all of our published datasets and vocabularies as 'University of Washington Libraries Semantic Web Data'. This is the title of our UWLSWD index page, and following pull request uwlib-cams/uwlswd#77 , also the schema:title for the published document, which will hopefully put the index in search results for that phrase eventually (?).
BUT currently, first-page search results for 'University of Washington Libraries Semantic Web Data' include the VoID dataset description for our "Instances of (...)" datasets, and the UWLSWD index not at all. This is likely due to the fact that the VoID description has (excerpted):
<html>
<head>
<script type="application/ld+json">
{
(...)
"name" : "VoID Description of the dataset 'University of Washington Libraries' Semantic Web Data'" ,
"description" : "University of Washington Libraries' Semantic Web Data" ,
(...)
</script>
<body>
<h1>University of Washington Libraries' Semantic Web Data</h1>
(...)
</body>
</body>
</html>
@cspayne @gerontakos does this sound like a reasonable plan?
Working in branch VoID_rename, attempting to implement new title and description as proposed above, but datacite_metadata.py and main.py are incompatible with the VoID resource, I think because the VoID resource contains multiple 'top-level resources' (DOIs with no hash identifiers). I think we knew that this would happen--but I forgot!
Thus, updates to VoID resource and serializations will have to be made in a more labor-intensive fashion. Error details below just in case, although I don't think we plan to change main.py so that it can work with the format of this resource.
====================
Generating DataCite metadata file from ../uwlswd_datasets/void/void_description_of_the_dataset_university_of_washington_libraries_semantic_web_data.rdf
====================
Type error at char 28 in expression in xsl:result-document/@href on line 12 column 47 of rdf2datacite.xsl:
XPTY0004 A sequence of more than one item is not allowed as the first argument of
fn:substring-after() ("https://doi.org/10.6069/uwlib.55.a",
"https://doi.org/10.6069/uwlib.55.a.3.6")
In template rule with match="/" on line 8 of rdf2datacite.xsl
A sequence of more than one item is not allowed as the first argument of fn:substring-after() ("https://doi.org/10.6069/uwlib.55.a", "https://doi.org/10.6069/uwlib.55.a.3.6")
====================
PROCESSING void_description_of_the_dataset_university_of_washington_libraries_semantic_web_data
====================
void_description_of_the_dataset_university_of_washington_libraries_semantic_web_data.rdf generated
void_description_of_the_dataset_university_of_washington_libraries_semantic_web_data.nt generated
void_description_of_the_dataset_university_of_washington_libraries_semantic_web_data.ttl generated
void_description_of_the_dataset_university_of_washington_libraries_semantic_web_data.jsonld generated
generating HTML+RDFa with Schema.org data
WARNING: schema:version missing from rdf/xml
Type error at char 8 in expression in xsl:value-of/@select on line 216 column 11 of rdf2schemaorg.xsl:
XPTY0004 A sequence of more than one item is not allowed as the first argument of
fn:replace() ("https://www.lib.wash ... ceResource-1-0-0.ttl", "https://www.lib.wash ...
ggregation-1-0-0.ttl")
at template rdf2schemaorg on line 11 column 40 of rdf2schemaorg.xsl:
invoked by xsl:call-template at file:/C:/Users/Benjamin/od/uwlswd/uwlswd/xsl/rdf2htmlrdfa.xsl#67
In template rule with match="/" on line 39 of rdf2htmlrdfa.xsl
A sequence of more than one item is not allowed as the first argument of fn:replace() ("https://www.lib.wash ... ceResource-1-0-0.ttl", "https://www.lib.wash ... ggregation-1-0-0.ttl")
void_description_of_the_dataset_university_of_washington_libraries_semantic_web_data.html generated
Traceback (most recent call last):
File "C:\Users\Benjamin\od\uwlswd\uwlswd\py\main.py", line 139, in <module>
process_file(file_path, fancy)
File "C:\Users\Benjamin\od\uwlswd\uwlswd\py\main.py", line 80, in process_file
fancify_HTML(output_file)
File "C:\Users\Benjamin\od\uwlswd\uwlswd\py\fancyhtml.py", line 13, in fancify_HTML
tree = ET.parse(filepath)
^^^^^^^^^^^^^^^^^^
File "src\lxml\etree.pyx", line 3541, in lxml.etree.parse
File "src\lxml\parser.pxi", line 1879, in lxml.etree._parseDocument
File "src\lxml\parser.pxi", line 1905, in lxml.etree._parseDocumentFromURL
File "src\lxml\parser.pxi", line 1808, in lxml.etree._parseDocFromFile
File "src\lxml\parser.pxi", line 1180, in lxml.etree._BaseParser._parseDocFromFile
File "src\lxml\parser.pxi", line 618, in lxml.etree._ParserContext._handleParseResultDoc
File "src\lxml\parser.pxi", line 728, in lxml.etree._handleParseResult
File "src\lxml\parser.pxi", line 657, in lxml.etree._raiseParseError
File "../uwlswd_datasets/void/void_description_of_the_dataset_university_of_washington_libraries_semantic_web_data.html", line 35
lxml.etree.XMLSyntaxError: Premature end of data in tag script line 6, line 35, column 11
disregard commits to deleted branch above for the sake of easier-to-compare diffs for these manual changes
for short term title and description fix
text/html
html/head/link
resourceshtml/head/link
resourcesowl:version
triples from RDF/XML + rows from HTML)owl:version
triples from RDF/XML + rows from RDF/XML)en-US
lang tagsSTOPGAP MEASURES COMPLETE ✅
A starting point from https://github.com/uwlib-cams/uwlswd/discussions/42