linked-art / linked.art

Development of a specification for linked data in museums, using existing ontologies and frameworks to build usable, understandable APIs
https://linked.art/
Other
91 stars 13 forks source link

Booleans in Linked.Art? #518

Closed bluebinary closed 4 months ago

bluebinary commented 1 year ago

Hello all,

I was wondering does anyone have any suggestions for a semantically-sound pattern for modeling booleans in Linked.Art?

While the underlying JSON of course allows us to represent booleans, it doesn't seem that there is a defined Linked.Art pattern to do this.

We have dealt with a few boolean-style use cases, and another just came up for modeling booleans for a "blocked content" flag for publishing podcasts to Apple Podcasts, where the overall podcast record in our system is modeled as an InformationObject. We use these InformationObject records within our own ecosystem for example to populate parts of the website, and then to generate a podcast RSS feed from the podcast InformationObject records for Apple Podcasts and other podcast aggregators. These IO records will be publicly available, so although we could come up with whatever convention seemed appropriate within the institution to model a boolean for this and similar use cases, ideally we would be able to use a standard pattern to do this, rather than inventing a bespoke one.

When we have dealt with this need in the past, we have modeled the boolean as two complimentary local thesaurus terms, one to indicate a true/yes state and another to indicate the false/no case. We generally want to include both the true and false states as an indication that the value has been affirmatively determined by the workflow to be true or false. We could base downstream logic on the presence (or absence) of something in the JSON-LD, but for certain use-cases we felt it was important enough to be able to affirmatively state the value either way (both the true and the false) rather than effectively having to make an assumption downstream that if x is present (or absent) from the document, do y, as x could theoretically be present (or absent) for reasons unrelated to its actual upstream value, such as a data entry or data processing issue.

Using local thesaurus terms to model booleans could look something like the following, using a pair of complimentary local thesaurus term such as https://data.getty.edu/local/thesaurus/content-is-blocked and its opposite https://data.getty.edu/local/thesaurus/content-not-blocked perhaps like so:

{
  "id": "https://data.getty.edu/media/podcast/123.json",
  "type": "InformationObject",
  "_label": "Podcast 123",
  "assigned_by": [
    {
      "type": "AttributeAssignment",
      "_label": "Flag to instruct podcast publishers to block content",
      "classified_as": [
        {
          "type": "Type",
          "id": "https://data.getty.edu/local/thesaurus/content-blocked-flag",
          "_label": "Content Blocked Flag"
        }
      ],
      "assigned": {
        "type": "Type",
        "id": "https://data.getty.edu/local/thesaurus/content-is-blocked"
      }
    }
  ]
}

In doing some more research about which properties in the Linked.Art model may accept booleans, I noticed that Dimension currently "allows" booleans to be assigned to its value property.

However, I'm not certain if this is supposed to be allowed or not, as it seems that booleans fall outside of what CRM considers to be valid instances of E60 Number, which as far as I understand are supposed to "comprise any encoding of computable (algebraic) values such as integers, real numbers, complex numbers, vectors, tensors etc", and I don't think a boolean is considered an algebraic value in the true sense although they are usually computable?

If booleans are not valid here, I believe this may currently be succeeding as in Python bool is a subclass of int. As such, we likely need to specifically reject bool values, otherwise they could be getting through via type checks such as isinstance(value, (int, float)) which allow int, float, and bool (due to it being a subclass of int), as well as any other subclasses thereof, where we may need to use something like isinstance(value, (int, float)) and not isinstance(value, bool) instead?

Currently, the following is possible, but perhaps should not be?

import cromulent

from cromulent.model import (
  InformationObject,
  AttributeAssignment,
  Type,
  Dimension,
  factory,
)

info = InformationObject(
  ident = "https://data.getty.edu/media/podcast/123.json",
  label = "Podcast 123",
)

info.attributed_by = aa = AttributeAssignment(
  ident = "",
)

aa.classified_as = Type(
  ident = "https://data.getty.edu/local/thesaurus/content-blocked-flag",
  label = "Content Blocked Flag",
)

aa.assigned = dimension = Dimension(
  ident = "",
  label = "Blocked Content?",
)

dimension.value = True

print(json.dumps(factory.toJSON(info), indent = 4))

It generates the following JSON-LD:

{
  "@context": "https://linked.art/ns/v1/linked-art.json",
  "id": "https://data.getty.edu/media/podcast/123.json",
  "type": "InformationObject",
  "_label": "Podcast 123",
  "attributed_by": [
    {
      "type": "AttributeAssignment",
      "classified_as": [
        {
          "id": "https://data.getty.edu/local/thesaurus/content-blocked-flag",
          "type": "Type",
          "_label": "Content Blocked Flag"
        }
      ],
      "assigned": [
        {
          "type": "Dimension",
          "_label": "Blocked Content?",
          "value": true
        }
      ]
    }
  ]
}

This isn't a question or concern with cromulent however, it was simply used as a convenient way to model and generate the JSON-LD and to see which properties, if any, in Linked.Art would accept bool values.

If booleans are acceptable E60 Number values, and the above JSON-LD excerpt or something similar is semantically-sound, I believe it would make things simpler for producers and consumers, rather than having to rely on custom uses of other modeling patterns such as classifications and the presence or absence of such to determine boolean states.

Thank you in advance for any advice or suggestions you may have to offer.

azaroth42 commented 1 year ago

What about just putting the flags into classified_as directly, rather than through the attribute assignment?

bluebinary commented 1 year ago

Hi Rob, apparently there was a suggestion to embed the classified_as beneath an AttributeAssignment as this property is a custom value that seems to be more to do with the publishing (or rescinding) of the associated podcast – effectively a temporal workflow related property – rather than it being a property inherent about the podcast itself. I can ask for and provide more background on that discussion if it would helpful, as I wasn't privy to the original conversation.

We can certainly map the classified_as directly into the top-level of the InformationObject, and that would be an easy change, however, we were especially curious to know if there were any preferred patterns for mapping the boolean value itself into Linked.Art that may have been overlooked, and if not, just to confirm that mapping a classification, or one of a pair of complimentary classifications (to affirmatively map both the true and false states), as a stand-in for a boolean value is the preferred way to do it?

Lastly, are we meant to be able to assign a bool type value into a Dimension entity's value property?

Thank you again!

cbutcosk commented 1 year ago

@bluebinary FWIW I read the scope note on https://www.cidoc-crm.org/entity/e60-number/version-7.1.1 to mean more of a distinction between number types and time / space coordinates? Practically anyway the CRM's ontology in RDF has the range of crm:P90_has_value as rdf:Literal, so "true"^xsd:boolean should be fine technically. And _: rdf:type crm:E60_Number ; crm:P90_has_value "true"^xsd:boolean is understandable / reasonable to me as someone reading l.a. data.

azaroth42 commented 1 year ago

Dimension values can't be a boolean in linked art, because then it would have two different data types, breaking systems that rely on it being only one. They have to be floating point values (which could be expressed as whole integers, but the interpretation would be 1.0).

There isn't a way to natively map a boolean into linked art, as there's no property in the ontology that takes a boolean value.

cbutcosk commented 1 year ago

@azaroth42 Welp the l.a. model docs have ints, the context has no type constraints, and as I said crm:P90_has_value has a range of rdf:literal in the cidoc-crm ontology. 😉

azaroth42 commented 1 year ago

Yeah, should make the docs more consistent! Agree about rdf:literal in the ontology, however we specialize the use of value to only numbers (and should be only floats). Otherwise we end up with:

{
  "type": "Dimension",
  "value": "1.0"
}

and the ambiguity of whether it was supposed to be a string or a number.

The mapping to literal in the ontology is to allow for the (terrible) pattern:

{
  "type": "Dimension",
  "value": "1.0 - 1.2"
}

which is legal in the abstract, but worthless as data. The worthlessness prompted the creation of the upper/lower value bounds predicates to fix the situation, but for backwards compatibility the ontology wasn't changed.

azaroth42 commented 1 year ago

Discussion on Linked Art call:

Can people suggest good use cases please so examples can resonate with reality?

cbutcosk commented 10 months ago

The use case I've seen is the reverse of @bluebinary 's but equally implementable with a classification + meta-type. Some objects have a "Web Access" flag that are OK to publish as pages in the collection on the website while those without it are not OK to publish, with a similar flag on Images. I've wished for a similar feature flag when sourcing multiple datasets from one document graph, eg ex-thesaurus:cool-catalogue-object or feeding a new application feature / deployment setup (ex-thesarus:has-datapoint-for-new-feature.

In those cases I wouldn't drop down to an AttributeAssignment (want it to be as visible as possible in that case, would suck to publish something that shouldn't be) but tomato-tomatoh.

Looking through @bluebinary 's example, does that look like (using prefixes for brevity):

{
  "id": "https://example.org/object/1",
  "type": "HumanMadeObject",
  "classified_as": [{
    "id": "ex-thesaurus:web-access-enabled",
    "type": "Type",
    "_label": "true",
    "classified_as": [{
       "id": "ex-thesaurus:truthy-flags",
       "type": "Type"
    }]
  }]
}
azaroth42 commented 10 months ago

Are there any non-API level use cases? Following the principle that we discussed on the last call, we would solve access sorts of flags at the API level with a _flag property ... where we can do whatever we want as it's not part of the model.

cbutcosk commented 9 months ago

FWIW the use case I described would've needed the flag represented in RDF to achieve the effect because all data was being sourced to RDF. But a feature flag or access control system would've processed the graph data (expressed as classification flags) so a documented (optional) graph pattern to persist the data in RDF would've been useful.

azaroth42 commented 6 months ago

The backend management of the data isn't the responsibility of the model though. Unless there's a good flag that's needed for the real world entities being described, I propose close, in that the only thing to do was propose a good example for the documentation... if there isn't a concrete example, a good sign it's unnecessary :)

azaroth42 commented 4 months ago

Agree close without prejudice -- can use classified_as for the semantic modeling, and _fields in the API for extensions.

If there is a use case in the semantic layer, then we can rediscuss :)