hiscom / hispid

HISPID Terms
6 stars 1 forks source link

XSLT and changes to RDF #73

Closed nielsklazenga closed 8 years ago

nielsklazenga commented 9 years ago

I have created the XSLT file to transform the RDF to HTML (terms-rdf-to-html.xslt). The HTML that is now at http://hiscom.chah.org.au/hispid/terms/ is created directly from RDF using this XSLT.

I have recreated the RDF file, as the MediaItem properties weren't in there yet. While doing that I made a few changes to the RDF:

  1. The objects after the rdf:type predicate were in the rdfs namespace (http://www.w3.org/2000/01/rdf-schema#). I put them in the rdf namespace (http://www.w3.org/1999/02/22-rdf-syntax-ns#), where Darwin Core has them. I didn't check if rdfs has the RDF types as well.
  2. I think using skos:exactMatch and skos:closeMatch to match the old HISPID transfer codes to the current concepts is not appropriate, as they (the old HISPID terms) are older versions, not different concepts. Therefore, I have done the same thing as the Darwin Core RDF does and used dcterms:replaces instead. This leaves the SKOS mapping properties for mapping between different concepts and different standards. I have created a hispidtermshistory.rdf file in which I have dumped all the HISPID transfer codes from HISPID 3, 4, 5, so the new terms replace the HISPID 5 terms, HISPID 5 terms replace the HISPID 4 terms etc. The history file needs a bit more work.
  3. I think dcterms:hasVersion should only be used to indicate the version of the term that is currently in use and not for older versions. Therefore there can only be one hasVersion element for each property. Following the Darwin Core RDF, I have appended the qualified name with the dcterms:issued date, e.g. 'http://hiscom.chah.org.au/hispid/terms/catalogNumber-2015-07-11'. For the terms from previous versions I have appended the version instead, e.g. 'http://hiscom.chah.org.au/hispid/terms/accid-hispid3'.

I didn't want to overwrite the existing terms.rdf file before having discussed this, so I created hispid2015.rdf.

I have adopted Ben's use of skos:scopeNote for the usage column. I have used the same predicate for the vocabularies, prepending the values with 'Vocabulary: ', so that the XSLT style sheet can recognise them as such. skos:note is used for general comments and the GitHub issues (values for the latter are prepended with 'GitHub issue: '). I think we can find a use for skos:changeNote as well.

nielsklazenga commented 9 years ago

A few more changes this morning:

  1. I was thinking that, since our terms are the Darwin Core (or Dublin Core etc.) terms, there is no point matching them against those terms, so I have removed all the skos:exactMatch-es as well (after already removing the skos:closeMatch-es yesterday). As we often refine the definition or usage, it might be appropriate to redefine the terms we adopt from other namespaces in HISPID, but then we should use owl:sameAs, rather than skos:exactMatch to indicate that the HISPID term is the Darwin Core term, rather than another term with the same name and semantics. The reasons why owl:sameAs is not always appropriate, as outlined in the SKOS primer, do not apply here.
  2. For the same reason I have removed the versioning from all terms that are defined elsewhere.
  3. As the HISPID namespace is given as the xmlbase in the root element (rdf:RDF), there is no need to repeat it later in the document, so I have removed it from rdf:about and dwcattributes:organizedInClass. The XSLT style sheet has been set up so it will give the same output whether the namespace is included or left out.
ben3000 commented 9 years ago

Niels, for easiest discussion of these changes, you should really submit them direct on the terms.rdf, or in a separate branch on the terms.rdf file. That way we can compare differences on the file between versions. I don't think it is as easy, at least in Github to compare two differently-named files.

nielsklazenga commented 9 years ago

Happy to do that.

nielsklazenga commented 9 years ago

I have pushed the 'terms.rdf' file through. Sorry, missed the bit about the separate branch. I think the comparison in GitHub might not work anyway, as the order of the elements will have changed.

nielsklazenga commented 9 years ago

I forgot to put in some commit messages, so here is what I have done.

  1. After reading the RDF 1.1 Primer, I have put the namespace IRIs back in the rdf:about attributes, only I have put in the namespace IRI for the namespace in which a term is defined, rather than the HISPID IRI for all terms.
  2. After already having removed the version last week, I have now also removed dcterms:issued and dcterms:modified for elements that are not defined by HISPID.
  3. I have removed XML namespace attributes from the rdf:RDF element for namespaces that aren't used in the document (dc, owl, abcd, hispid3, hispid4, hispid5).
  4. I have added language (xml:lang) and data type (rdf:datatype) attributes for elements with literal values, where required.
ben3000 commented 8 years ago

+1 on the rdf:type change, there is no apparent difference (as tested with the RDF Validator), so no point in going back.

ben3000 commented 8 years ago

+1 on skos:exactMatch and skos:closeMatch to dcterms:replaces.

ben3000 commented 8 years ago

On dcterms:hasVersion I take the wider view that the older standards were older versions of this same term. I don't think either interpretation is wrong, just that I have a different view.

http://terms.tdwg.org/wiki/dcterms:hasVersion says:

A related resource that is a version, edition, or adaptation of the described resource.

nielsklazenga commented 8 years ago

I don't disagree about the semantics of dcterms:hasVersion. I just think it is more useful to know what the current version is than to list all versions, which you can easily retrieve by following the 'replaces/replacedBy' trail (it's RDF after all). My use is the same as in Darwin Core, which we are basing HISPID on.

nielsklazenga commented 8 years ago

Just noticed that the 'history' file with all the concepts is not there yet (there is an RDF file, but not HTML). Will try to get that done before the weekend.

nielsklazenga commented 8 years ago

I mean before HISCOM.

ben3000 commented 8 years ago

+1 on ensuring graph traversal continues to work. This was the core of my concern with removing dcterms:hasVersion, while dcterms:replaces/dcterms:isReplacedBy do a better job given the problem of linking to several terms that together match the semantics of the term being documented (see #63).

ben3000 commented 8 years ago

The correct term is dcterms:isReplacedBy (http://purl.org/dc/terms/isReplacedBy) rather than replacedBy.

nielsklazenga commented 8 years ago

Yes, slip of the keyboard (or brain fart). The history file is very incomplete, so isReplacedBy hasn't been used yet. The hispidterms.rdf obviously only uses replaces.

I notice that the URLs for the dcterms:replaces attribute are all wrong. Aaron and I might have a chance to fix all these things on Tuesday, so keep them coming.

ben3000 commented 8 years ago

Fixed the resource URIs in terms.rdf, as noted in the linked commit.

nielsklazenga commented 8 years ago

Re rdf:type vs rdfs:type: there is no such thing as rdfs:type; RDFS itself uses rdf:type.