linked-art / linked.art

Development of a specification for linked data in museums, using existing ontologies and frameworks to build usable, understandable APIs
https://linked.art/
Other
95 stars 15 forks source link

Visible (sensitive) biases in recommended vocabularies #659

Open aisaac opened 2 months ago

aisaac commented 2 months ago

While catching-up with this summer's discussion, I've looked at https://deploy-preview-638--linked-art.netlify.app/model/vocab/recommended/#languages and am realizing that we have visible bias issues there.

I mean, of course as any model, LA is going to have biases (for example on object types). But the lists for nationalities, currencies and even measurement units have more sensitive biases than others:

I don't think it is good from a (wider) community perspective to have a spec that says "you should use these terms" under these conditions, even if we acknowledge that there are "specific reasons" not to do so and we have technically valid interoperability motivations.

A minimal move to alleviate this issue would be to downplay these lists by making them only suggestions (i.e. put them 'optional'). But even then, I think we'd look better if we recommend a bigger and more neutral lists, devoid of semantic considerations (and if it has to be codes without URIs, so be it). Encouraging/supporting technical interoperability only for a selected list of happy fews doesn't feel very good. Unless we can put forward more objective reasons for the selection?

[1] As a reminder, here are the countries officially using 'inch' and 'feet': https://en.wikipedia.org/wiki/International_System_of_Units#/media/File:Metric_and_imperial_systems_(2019).svg If we have an exception for recommending values for them, why not other local measurement units used in the world, or other values for the lists where a sensitive bias is present?

azaroth42 commented 2 months ago

I agree that the lists are biased, but the bias is from the process and the organizations that are engaged with that process. If any organization would like to submit entries for the lists, then that would be wonderful :) It's trying to promote consistency to ease the burden of interoperability, not to change how organizations function or to try to get people to use terms that they traditionally would not.

So yes, they select a few values, and those values come from the western organizations that have participated to date. It would be great to have broader participation! I'm wary of picking terms to recommend (as opposed to list) for vocabulary entries that no one could really vouch for as either being in use or being appropriate to use.

Semantic mismatches - yes, but I believe that the alternative is worse -- minting a whole set of new terms for nationalities, when these terms are close enough. On the topic of nationalities: I'm American (by citizenship) but not from America... it's an easier chestnut to crack than the object "genre" one which is even less well defined (is a dreamcatcher made in a factory in China and sold in New Mexico cataloged as Native American or Chinese?)

And finally they're not required, they're recommended. The only required terms are the ones which would mean that the core representation would be unparseable without the consistency.

I could see moving currencies all to optional. It's only at the provenance edges of the model where we actually need them.

Languages I think have to stay recommended, otherwise the embedded content in those languages are indistinguishable. Or we could mandate the use of the notation field for the ISO8601 code (which as discussed in #570) has its own problems.

Nationalities ... we could certainly list more, happy for PRs to add them! We could also move nationalities to a separate page and link it from the recommended page if the list gets too long. I could also see moving them to optional, but in terms of search functionality, search by artist nationality was high on the list of requirements for LUX at least so any consistency will make interoperability much easier.

beaudet commented 2 months ago

Only tangentially related, we'll be going through our list of reported nationalities shortly and mapping them to the AAT with equivalent terms. Would it help inform this ticket / improve the process to create a consolidated list of nationalities stored in various org collection management systems along with the proposed mapping via equivalents to AAT? There aren't all that many of them since nationalities roughly track in scale to the number of countries in the world.

azaroth42 commented 2 months ago

Call on 2024-09-04:

Languages: Daniel has all ISO languages mapped to AAT. Can include. Nationalities: Can do the mapping to improve the list.

Break out into separate pages.

Make it clear on every page how to submit new terms. Make the submission process as easy as possible.

Double check the entries (re ancient egyptian vs egyptian)

aisaac commented 2 months ago

Thanks for having discussed the issue. I'm glad to hear about updates coming from @beaudet and @bluebinary . Do you need further help for the double checking?

Actually as an additional measure, maybe we can document the updates and the checking mentioned in this ticket, as evidence that we are aware of the problems, tried to tackle some and stand ready to work on it with possible with contributors, also in the future?

We could have a statement that would say: "We are aware that because of over-representation of CHIs from specific background in the community, our model and vocabularies may exhibit biases. This was observed for instance in earlier versions of the recommended lists of nationalities and languages [example]. We have worked on tackling these issues, but are conscious that this is a never fully-finished effort. This is why the Linked Art community is committed to keep making updates to its recommendations, together with any interested contributors". (Well, with more beautiful wording of course; I'm quite hastily writing this, but if people like the idea I can spend more time on it, after updates and checks have been made!)

beaudet commented 2 months ago

I've issued a small PR against the recommended vocabulary JSON in the Linked Art Cromulent repo for additional nationality mappings and will continue doing that as more are added over time. Would reviewing PRs be a good bi-weekly agenda item?

atiro commented 3 weeks ago

I'm not sure if it's quite what's wanted here and this is mostly a form of V&A project placement on my part but could the Chinese Iconography Thesaurus be of use for some terms ? E.g. https://chineseiconography.org/terms/CIT288653 for scripts. (Data here but not stable/complete - https://github.com/iconclass/cit)

azaroth42 commented 3 weeks ago

The recommended vocabulary list is here in the github: https://github.com/linked-art/linked.art/blob/master/docs/model/vocab/recommended/index.md

But feel free to drop table formatted data into the issue here and I'll copy them in :)

beaudet commented 3 weeks ago

For now, assume the docs are the primary source for these and cromulent json files will be updated from the docs.

beaudet commented 3 weeks ago

Add some doc clarification around what "required", "recommended", etc. mean in the context of these terms, e.g. that IF the data is available, then you MUST use these specific terms to describe that type of data rather than requiring data you don't have.

aisaac commented 1 week ago

I've tried to add some acknowledgement of vocabulary biases for optional and recommended vocabularies and invite community participation, see https://github.com/linked-art/linked.art/pull/682

aisaac commented 1 week ago

Regarding nationalities, @beaudet where is the PR you have mentioned earlier? Or is it already merged?

On my side I've investigated a bit and I'm still struggling with the examples I found. Namely, for Egyptian, we could replace "Egyptian (ancient)" (http://vocab.getty.edu/aat/300020251) by "Egyptian (modern)" (http://vocab.getty.edu/aat/300265077). But that doesn't feel satisfactory, as I guess one could be reluctant to use something that's explicitly related to a "modern culture" (and "African", if one looks at the parent concept in AAT) for a person from ancient times (and "Mediterranean" according to AAT's hierarchy). So maybe we could have both?

I've tried to search for inspiration where these AAT terms could be used already. And what I found doesn't smell good. Here's Ramses III (http://vocab.getty.edu/page/ulan/500372573) On the full ULAN display he's classified as both "Egyptian (ancient)" and "Egyptian". "Egyptian" is the prefered form, but I thought that this "Egyptian" could be interesting. It turns out that I can't find it in the RDF/XML data. That one has "Egyptian (Modern)" next to "Egyptian (ancient)". I don't know where the simple "Egyptian" came from, as it's not a label of http://vocab.getty.edu/aat/300265077. To make things worse, the JSON-LD has only "Egyptian (modern)" (https://vocab.getty.edu/ulan/500372573.jsonld)!

I was thinking, maybe we could just recommend to use any of the specializations of the AAT guide term "<styles, periods, and cultures by region>" (see hierarchy). This would avoid this kind of issue. Still it may run a risk for interoperability, if even the best experts around may do strange things...

Then the final blow came from the note in the documentation about ethnicity. It reads: "Ethnicity is separate from nationality, as it refers to a social group or culture as opposed to a political nation or state.". Having this, can we really use all these AAT styles/cultures used as nationality? This makes the semantic mismatch issue even worse: we're now stating a contradiction.

So at this stage I'm quite thinking like Kelly who said "Nationality is about as fraught as gender" and not continue to try to un-bias it. It feels we would spend time sticking plaster on a wooden leg... Enhancement should perhaps start instead by being blunt and stop using AAT styles/cultures for nationality.

Or should the documentation for ethnicity be changed? I'm a bit lost now. In fact I was thinking, if we want to recommend AAT, maybe we could swap the recommendations for nationality and ethnicity. Ethnicity examples refer to Wikidata. We could instead use Wikidata for countries instead and AAT cultures/styles for ethnicity. I believe Wikidata has a good representation of countries, and I would rather trust AAT for cultures, in our domain.

aisaac commented 1 week ago

And for languages and currencies, I was wondering, why not keeping only a part of the table as examples, and saying that all the children of selected AAT concepts are recommended?

Currencies may be a bit tricky, but there is a decent list under AAT's "http://vocab.getty.edu/aat/300411993" (see hierarchy). We could also add a reference to "money (objects)" (http://vocab.getty.edu/aat/300037316") if we would need to reference to historical or less established forms of currency (e.g. coins).

Languages would be quite straightforward, as can be seen in AAT's hierarchy below "languages and writing systems".

aisaac commented 1 week ago

And for the record I believe the measurements units are now in a much better shape now :-)

aisaac commented 1 week ago

@bluebinary looking again at past minutes I see you said you could contribute nationalities from ISO. Would this enable some improvement regarding the issues described above?

azaroth42 commented 1 day ago

Leaving this open for assistance (@bluebinary and others) :)

beaudet commented 1 day ago

Here's the PR that hasn't been merged yet: https://github.com/linked-art/crom/pull/55/commits

But since the docs are the primary source now, I'll list the contents of the small PR here.

"netherlandish": {"parent": "Nationality", "id": "300020929", "label": "Netherlandish", "req": "recommend"}, "liechtenstein nationality": {"parent": "Nationality", "id": "300386388", "label": "Liechtenstein", "req": "recommend"}, "english nationality": {"parent": "Nationality", "id": "300111178", "label": "English", "req": "recommend"}, "bohemian nationality": {"parent": "Nationality", "id": "300266148", "label": "Bohemian", "req": "recommend"},