isawnyu / pleiades-gazetteer

This repository provides a home for tickets and other planning documents for the Pleiades gazetteer of ancient places. Code is kept in multiple other repositories.
https://pleiades.stoa.org
11 stars 0 forks source link

citation type URIs in JSON exports are wrong (1 story point) #416

Closed paregorios closed 3 years ago

paregorios commented 4 years ago

URIs for our citation types are generated for our JSON exports using the namespace "https://pleiades.stoa.org/vocabularies/feature-type/". This is wrong, and affects both the en masse export and the individual serializations.

The correct namespaces appear in the RDF, using the rdfs and cito prefixes.

alecpm commented 3 years ago

I'm not quite sure what to do here. The bad vocabulary link is hardcoded here:

https://github.com/isawnyu/PleiadesEntity/blob/master/Products/PleiadesEntity/browser/adapters/__init__.py#L257

That's pretty obviously wrong: it points to a vocabulary that doesn't exist, and the field involved doesn't have a dynamic vocabulary with a url that we could link to. It seems like the best approach may be to remove the citationTypeURI attribute since it doesn't seem to be used anywhere else and it's not clear to me where it would point.

Comparing RDF and JSON data, e.g.:

https://pleiades.jazkarta.com/places/175420887/json https://pleiades.jazkarta.com/places/175420887/rdf

I'm not seeing anything that would correspond to a citationTypeURI. The JSON already has a type field that corresponds to the cito:... tag name the RDF.

The following two lines in the RDF:

    <cito:citesForInformation rdf:resource="https://it.wikipedia.org/wiki/Necropoli_di_Prato_Rosello"/>
    <dcterms:bibliographicCitation>Wikipedia (Italian) Necropoli di Prato Rosello</dcterms:bibliographicCitation>

seem to correspond to:

        {
            "alternateURI": "",
            "accessURI": "https://it.wikipedia.org/wiki/Necropoli_di_Prato_Rosello",
            "citationDetail": "Necropoli di Prato Rosello",
            "citationTypeURI": "https://pleiades.jazkarta.com/vocabularies/feature-type/seeFurther",
            "shortTitle": "Wikipedia (Italian)",
            "bibliographicURI": "https://www.zotero.org/groups/2533/pleiades/items/itemKey/UMFZH98D",
            "formattedCitation": "Wikipedia: L’enciclopedia libera e collaborativa (2001-), Necropoli di Prato Rosello.",
            "otherIdentifier": " ",
            "type": "seeFurther"
        },

In the JSON. From what I can tell there doesn't seem to be anything missing from the JSON, just an additional incorrect citationTypeURI which can probably be safely removed.

paregorios commented 3 years ago

RDF has a convention for using short prefixes (like "cito") to make long URIs more readable. If you view the raw RDF/XML, you'll see these defined as attributes on the root element, e.g.:

<rdf:RDF
  xmlns:foaf="http://xmlns.com/foaf/0.1/"
  xmlns:pleiades="https://pleiades.stoa.org/places/vocab#"
  xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
  xmlns:osgeo="http://data.ordnancesurvey.co.uk/ontology/geometry/"
  xmlns:prov="http://www.w3.org/TR/prov-o/#"
  xmlns:owl="http://www.w3.org/2002/07/owl#"
  xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:spatial="http://geovocab.org/spatial#"
  xmlns:cito="http://purl.org/spar/cito/"
  xmlns:dcterms="http://purl.org/dc/terms/"
  xmlns:skos="http://www.w3.org/2004/02/skos/core#"

JSON has no such convention, so in order to produce a semantically equivalent identifier for the term in the JSON, we have to emit the full URI.

For these citation type URIs, we are making use of externally defined vocabularies, rather than our own. We want full URIs in the JSON that point to those canonical, external URIs. They can be formed (and yes, this will have to be hard-coded) by prefixing the same URI base that appears in the corresponding prefix definition in the RDF.

So, for example, where we currently see in the JSON:

"citationTypeURI": "https://pleiades.jazkarta.com/vocabularies/feature-type/seeFurther",

we want to see instead

"citationTypeURI": "http://purl.org/spar/cito/citesForInformation",

The logic for this is complicated in part because our works/references code began life without any awareness of the "cito" vocabulary, so there are a couple of terms that are part of the PleiadesEntity type definitions that have to be mapped to the corresponding "cito" terms when serializing as part of these URIs. "seeFurther" (in the example) above, is one of these. It maps to "cito:citesForInformation". The code that does this mapping for our RDF serialization can be seen in the pleiades-ref repository. Specifically, note the definitions beginning at line 25 in common.py. The serialization logic (including the remapping) seems to be done in the "references" method, beginning at line 298.

Does that help?

skleinfeldt commented 3 years ago

Yikes...

alecpm commented 3 years ago

@paregorios I see. It should be easy enough to use the XML namespace url from the RDF as the prefix for the citation url. I'll get that in place shortly. Thanks!

alecpm commented 3 years ago

This is in place on pleiades.jazkarta.com:

https://pleiades.jazkarta.com/places/175420887/json

(you may need to shift reload)

paregorios commented 3 years ago

Looks great. Let's merge to master and call this one done!

alecpm commented 3 years ago

This was already merged and the pin was updated, so I'll close it.