sul-dlss / cocina-models

Cocina repository data model (implemented in Ruby)
https://sul-dlss.github.io/cocina-models/
3 stars 0 forks source link

Validate that "Archived website" is valid data #562

Open jcoyne opened 1 year ago

jcoyne commented 1 year ago

We're seeing invalid data come to the access side, which is requiring us to validate the data and not display it. Ideally, this bad data would never be sent to us and the content creator would be notified there was a problem when they were accessioning.

https://github.com/sul-dlss/sul-embed/issues/1460

The public XML for https://purl.stanford.edu/qw622dx4390.xml had this:

    <location>
      <url displayLabel="Archived website">https://swap.stanford.edu/was/https://wexarts.org/talks-more/tongues-untied</url>
    </location>

But this URL is required by the viewer to contain /*/

It should look like:

    <location>
      <url displayLabel="Archived website">https://swap.stanford.edu/was/*/https://wexarts.org/talks-more/tongues-untied</url>
    </location>

Please add a validation for this.

In cocina this looks like:

        "access": {
            "url": [{
                "value": "https://swap.stanford.edu/was/*/https://wexarts.org/talks-more/tongues-untied",
                "displayLabel": "Archived website",
            }],
andrewjbtw commented 1 year ago

I don't think we should use description to store this value. Using this field is a carry over from the Fedora era: https://github.com/sul-dlss/was_robot_suite/issues/279

If we validate this, then the validation should apply only to objects with the content type "webarchive-seed". We shouldn't be constraining description for other content types, as this issue is specific to how the webarchive seed display works.

We should validate whatever field we use to store the link that the timemap depends on.