thegetty / pipeline

Data pipeline work
Apache License 2.0
8 stars 5 forks source link

What is the correct modeling of `portal.getty.edu` links? #54

Open kasei opened 5 years ago

kasei commented 5 years ago

Right now auction events have portal links using referred_to_by, but it seems like there might be some association between the sale catalog (the event's subject_of) and the portal WebPage:

https://github.com/thegetty/pipeline/blob/1c933048f083e70d74db9e3de682dffaf6c055b2/pipeline/projects/provenance/__init__.py#L181

{
  "type": "Activity",
  "_label": "Auction Event for F-A235",
  "referred_to_by": [
    {
      "id": "http://portal.getty.edu/books/inha_18072",
      "type": "LinguisticObject"
    }
  ],
  "subject_of": [
    {
      "id": "urn:uuid:0326dac7-8933-4949-8e37-47a2424efc7d",
      "type": "LinguisticObject",
      "_label": "Sale Catalog F-A235"
    }
  ]
}

Similarly for HumanMadeObjects, exact modeling of the portal link relationship needs to be confirmed.

https://github.com/thegetty/pipeline/blob/1c933048f083e70d74db9e3de682dffaf6c055b2/pipeline/projects/provenance/__init__.py#L809

azaroth42 commented 5 years ago

Agree -- we should confirm what the portal pages are thought to be. If they're all just copies of the catalogs, then they should be linked only from the catalogs. If there's also other documents that are mixed in, then we might need to link to them from other entities as well.

kasei commented 4 years ago

Portal links are now modeled as web pages:

    {
      "id":"http:\/\/portal.getty.edu\/books\/inha_17691",
      "type":"DigitalObject",
      "_label":"http:\/\/portal.getty.edu\/books\/inha_17691",
      "classified_as":[
        {
          "id":"http:\/\/vocab.getty.edu\/aat\/300264578",
          "type":"Type",
          "_label":"Web Page"
        }
      ]
    },

However, there's some uncertainty about how Arches can support the actual URL link, so it is included as both the id and _label.