Spring 2020 Upgrade: Taxonomies and enums

robredpath commented 4 years ago

This issue is to provide a central, coordinating issue to the background behind why the Open Referral UK team want to make changes to the way that we represent various types of information in Open Referral, and to link to related issues. It may, depending on discussion on calls, lead to a concrete proposal (or may not!).

Background

Several users of Open Referral have found that the existing enums in the schema are unsuitable, and there's been some misunderstanding about how prescriptive these were intended to be (see, for example, #207 where @greggish says that they're intended as examples, but technically speaking, enums are closed list of acceptable values - ie, can't be added to while still producing valid data).

There are also cases where users require a more nuanced representation than a single value can represent (for example if the criteria for accessing a service are that you must be legally male, aged 18-25 and be disabled in a certain way), or reference to a locally-specific vocabulary is desired (for example, to align with local government criteria). Issues #157 and #159 are related to this kind of issue. This is dealt with by service_taxonomy for services, but not otherwise.

Possible Solutions

I'd like to understand how people have worked around these constraints, and how we might update the standard to address these concerns in a way that works for everyone.

Whatever we do, I'd like to clean up the language around taxonomies. See #181 for more detail on that.

So far, I've heard two possibilities that we might consider:

Simply removing all the enums and allowing the relevant fields to be free-text fields. Local implementations could, if they so wished, use a taxonomy outside of the data to populate these fields, with whatever delimiter they wanted. This keeps the standard simple, but reduces the level of standardisation and makes interoperability harder.
Adopting a 'taxonomy-first' approach and making extensive use of link tables, following the example of service_taxonomy . This could either use a single table with a "taxonomy_type" field, or multiple tables. Something along these lines is the UK's preferred approach, and I'm sure they'll be sharing more detail as their thinking evolves.

Thoughts on the wider question here are very welcome, as are comments on the individual issues!

robredpath commented 4 years ago

One quick update from the call today - #181 suggests including a new table for vocabularies, and that discussion is absolutely in scope here.

NeilMcKLogic commented 4 years ago

@robredpath can you articulate the difference between what you are calling a "taxonomy" and what is a "vocabulary"? I think the original standards team intended "taxonomy" to represent any categorization scheme. In fact it probably should have been called "categories". The current structure also supports co-mingling different categorization approaches within the same dataset, as long as the author somehow distinguishes those. For example, you might prefix the "id" of a taxonomy node with something associating it with one of the particular taxonomy systems you're using.

robredpath commented 4 years ago

@NeilMcKechnie We're considering "taxonomy" and "vocabulary" to be synonymous - defined, broadly, as "lists of terms, that may be hierarchical, that can be used to describe concepts".

Could you give me an example of what you mean by "different categorization approaches within the same dataset, as long as the author somehow distinguishes those". I can understand what it might look like, but I'm struggling to understand how that would impact the usability of the data.

I don't think that anything we're considering would prevent this approach being used.

NeilMcKLogic commented 4 years ago

@robredpath thanks for clarifying. At the risk of being pedantic, I think that "vocabulary" is a different enough concept, at least in its broadly-accepted definition that it should not be used to be equivalent with "taxonomy" in the HSDS/HSDA specifications.

A number of entities I've worked with in the past that curate databases of community human and social services use more than one taxonomy to categorize the same set of records about those services. A good example are those that use the 2-1-1 Taxonomy https://211taxonomy.org/ in part to comply with industry standards for accreditation purposes, but also use a more colloquial custom categorization system that they believe is easier for their phone workers and social workers to understand and navigate. The 2-1-1 Taxonomy is also "cross-walked" to other categorization systems used in adjacent industries, for example the ICD-10 codes used extensively in the US healthcare system.

So in an HSDS export, the service_taxonomy entries would need to seamlessly support the insertion of potentially several of those categorization systems even though they may have very different structures and ids.

MikeThacker1 commented 4 years ago

@NeilMcKechnie how would you say that definitions of "vocabulary" and "taxonomy" differ? Thanks

NeilMcKLogic commented 4 years ago

Hi @MikeThacker1 , vocabulary is "the stock of words used by or known to a particular people or group of persons". In the English language there are about 170,000 currently used words. https://englishlive.ef.com/blog/language-lab/many-words-english-language/

Taxonomy is "the science or technique of classification" or "a classification into ordered categories". Here in North America, a very broadly used and comprehensive taxonomy system to classify human and social services, the 2-1-1 Taxonomy, has about 10,000 total terms. https://211taxonomy.org/

So to me (and I am very open to other interpretations) they are different concepts. You could apply taxonomy to vocabulary of English, for example, separating words into verbs, nouns, adjectives, etc.

In our field and in HSDS we might say the vocabulary are terms we use in a specific way, such as how we use "service" or "location" to the exclusion of their more general definitions. And in fact I've seen people create extensive "data dictionaries" to clearly define these sorts of record types and fields. But a taxonomy is how we classify services to be more readily grouped and found for referring to help-seekers.

robredpath commented 4 years ago

New proposal for this is at https://docs.google.com/document/d/10PAWTrHn6zHuFVUUpsibjD5UmnhmrIkRQDFk7pKA0Pw/edit#heading=h.1j4g3dch4jc9 - comments and input very welcome!

mrshll1001 commented 12 months ago

I am closing this as I believe 3.0 had quite a lot of discussion around taxonomies, so this issue may no longer be relevant.

Please re-open it if you believe otherwise! :-)

openreferral / specification

Spring 2020 Upgrade: Taxonomies and enums #214