openreferral / specification

The Human Services Data Specification - a data exchange format developed by the Open Referral Initiative
https://openreferral.org
Other
117 stars 49 forks source link

What happened to the taxonomy_service entity? #117

Closed NeilMcKLogic closed 7 years ago

NeilMcKLogic commented 7 years ago

I was sure in the documentation prior to its new home there was an entity called taxonomy_service that allowed a one-to-many association. It's fields were: id, service_id, service_name, taxonomy, axonomy_id

Am I misinformed or did it inadvertently get dropped? This is an important entity in the data model.

If it was dropped, I am concerned about what else might have been dropped.

greggish commented 7 years ago

Hrm. I do see taxonomy_id in the common vocabulary table, but I don't see the more detailed documentation from this section of the 1.0 spec.

timgdavies commented 7 years ago

Thanks @NeilMcKechnie for flagging this.

The updated documentation was based on the datapackage.json file and vocabulary table - but it looks like the only mention of the service_taxonomy.csv table was in Appendix D of the documentation - but never in the draft schema itself.

However, as you say this is an important entity - and if it is one that is already in de-facto use, something we should get into the schema and documentation.

@ekoner: have you been making use of any HSDS taxonomy tables in the current pilot work with Miami.

@robredpath Would you be able to look at how we best add Appendix D content into the current version of the docs. I would think that service_taxonomy.csv should be part of the datapackage.json file, and so in reference - with some of the text around it about the way taxonomies should be approached perhaps on a separate Taxonomies documentation page?

NeilMcKLogic commented 7 years ago

@timgdavies thanks so much for the response. Our implementation will definitely make use of the service_taxonomy entity. Additionally once the documentation for 1.0 is updated to include it and anything else omitted from "Appendix D", I intend to request tweaks to service_taxonomy in the current 1.1 cycle to make it more useful for us. Please keep us posted.

timgdavies commented 7 years ago

I've been digging into this more - and see that Ohana docs include description of taxonomy.csv but that it wasn't included in the datapacakge.json.

The old Appendix D then describes a service_taxonomy table which maps organisations, services and taxonomies:

image

However, none of this appears terribly consistent to me right now. Will investigate more of what we've been doing to get Miami data into Ohana shortly - and come back with proposed update to the 1.0 version, to try and represent the current state of play, and to act as a basis for discussions of what should change.

timgdavies commented 7 years ago

Investigations

Further investigations

Column Requirement Detail
taxonomy_id required The category's unique taxonomy id.
name required The name of the category.
parent_id required for child categories The taxonomy_id of the parent category.
parent_name required for child categories The name of the parent category.

This diverges from the Appendix D table, which, to be honest, I'm struggling to interpret, as it seems to mix information about organisations and services and taxonomy structures and IDs, but not term names, into a single table.

id organization_name organization_id service_name service_id taxonomy taxonomy_id
16478826-2010-44af-94c2-ca012144e78e AMERICAN RED CROSS BAY AREA CHAPTER, SAN MATEO COUNTY OFFICE 3f85b68d-d9b6-4dd3-b7a2-a7bd224214ad DISASTER PREPAREDNESS, RESPONSE AND ASSISTANCE SERVICES 9096bb54-2edc-4d07-80c6-d70eff6e40b3 emergency.disaster_response 89

@NeilMcKechnie Have you been using the service_taxonomy table in it's current form?

Checking user requirements

I want to check we have clear the user requirements for taxonomy:

Are there cases in which other entities (organisations, locations) should be classified with a taxonomy?

Ways forward

It's not clear to me what we can do as part of 1.0 tidy up, as I can't work out what was originally intended for the current version of the spec.

I don't think we want to add the service_taxonomy table to datapackage.json and the docs, as it isn't clear how this would be used.

I think we should then make a clear proposal for 1.1 for discussion. My initial version of that would be:

broadly following ohana's implementation of this, but perhaps tightening up definitions.

However- would really welcome other perspectives on what taxonomies should be looking like.

NeilMcKLogic commented 7 years ago

Thanks @timgdavies . Here is my perspective on the requirements (sorry for the length):

-Each service may have a one-to-many relationship with one or more categorization systems. To me this naturally suggests an object that is a child of service, like service_taxonomy is/was. These should only connect to "service" and not any other entity (organization, location, etc).

-Candidates for these categorization systems are: the AIRS Taxonomy; Other curated categorization schemes like Open Eligibility (which we don't use but others in the Open Referral world may); Custom-built hierarchical categorization systems (which are very common outside of the AIRS world).

-Any service should be able to be assigned to zero, one, or multple of these different categorization schemes (e.g. any service can be assigned both to the AIRS Taxonomy and to their own custom categorization scheme).

-Each categorization scheme entity may have for each code: A name; A unique identifier (alphanumeric); Zero (if a top node) or exactly one (any other node) parent category; child categories. The "name" can be translated into other languages (American English, Canadian English, Canadian French and in the future I suppose UK English, Spanish, etc).

-There may be licensing restrictions with the AIRS Taxonomy that should be explored with AIRS (Clive Jones) and the AIRS Taxonomy owners (2-1-1 Los Angeles) regarding how much of the AIRS Taxonomy can be included and by whom. In theory an entity who subscribes to use it ought to be able to populated exported data with it.

-Categories may be "linked" together to increase precision. For example in the AIRS Taxonomy, an assignment of the term "Food Pantries" indicates it is intended for broad use by the general population. However an assignment to "Food Pantries * Immigrants" (read the asterisk as the word "for") indicates the service is really only intended for immigrants. More than two terms can be linked together in this way in a single combination.

timgdavies commented 7 years ago

Thanks for the detailed response on this: really appreciated.

The linking of taxonomies is particularly interesting.

Is there a way this could be captured through separating 'thematic' from 'target group' taxonomies?

E.g. having one column to link to 'theme' and one to link to 'target group'? Or would this diverge too much from how current systems work (I can see that the * notation could be complex for some systems to implement)

NeilMcKLogic commented 7 years ago

You betcha!

To your question, the wrinkle I see is that AIRS Taxonomy terms can have more than one linked term, and these combinations are all valid:

ServiceTerm1 ServiceTerm2 TargetPopulationTerm3

or

ServiceTerm1 TargetPopulationTerm2 TargetPopulationTerm3

and so on....here's a real example of the second scenario: General Recreational Activities/Sports Students Faith Based Organizational Perspective

In practice, most assignments made are just a ServiceTerm; when a linked term is added it is normally just one and normally a TargetPopulationTerm. But whatever design we choose should accomodate scenarios like those above.

We have another client not using the AIRS Taxonomy but their own custom categorization system that also allows linking. Their "facets" are not Service and TargetPopulation but rather What, Why and Who. A real example from them: OCCUPATIONAL THERAPY developmental delay Children

timgdavies commented 7 years ago

I've taken a further look at this, and am staging for the 1.0 tidy up:

In the taxonomy_ids field I'm adding the description:

A comma separated list of identifiers from the taxonomy table.

Advanced users may also include composite categories, using to combine two taxonomy terms. For example: 'Food PantryImmigrants,Food Pantry*Homeless' (where 'Food Pantry','Immigrants' and 'Homeless' are identifiers in the taxonomy table), would indicate a food pantry service for the homeless or immigrants, but not available to other client groups.

In 1.1 I suggest we develop a more complete documentation page on taxonomies.

NeilMcKLogic commented 7 years ago

Hi Tim, why not the more elegant approach of a normalized table in service_taxonomy ?

timgdavies commented 7 years ago

I'm not sure the service_taxonomy table as it was currently described was either:

(a) More elegant - it seems to mix structure of taxonomy and linking taxonomy to services;

(b) The most obvious omission from the datapackage.json - as taxonomy.csv appear in a database diagram form, whereas service_taxonomy only as a table in an Appendix of the docs;

I'd also not found any cases of service_taxonomy in use - but if you are using this do let me know and point to examples if possible as that would potentially change the most appropriate 1.0 action.

NeilMcKLogic commented 7 years ago

Hi Tim, we are definitely using service_taxonomy already. I think the service_name field is wierd to be sitting in there so it could be removed.

Regarding your point a and b together: we probably still have copyright issues including the full taxonomy.csv file when it is the AIRS Taxonomy, owned by 2-1-1 Los Angeles, we are discussing. Hence having a field for "taxonomy_name" (probably a better title than just the current "taxonomy" which is too vague) allows something more descriptive to travel with the data than just the taxonomy code (taxonomy_id).

timgdavies commented 7 years ago

Ok. So to confirm: You currently use service_taxonomy to export:

for any terms in use.

But you do not export a full taxonomy tree etc.

Agreed that there may be IP issues to publishing a full taxonomy tree, particularly with the names in, and that in some cases, a publisher might have to provide the taxonomy_ids and leave users to independently access a taxonomy.csv file if they are appropriately licensed to do so. This is some of the elements I think we might want to address in more detailed documentation of taxonomy use in version 1.1.

NeilMcKLogic commented 7 years ago

Thanks Tim, yes, confirmed.

klambacher commented 7 years ago

Neils description above about the needs for the AIRS Taxonomy in particular is important - and also recognition of the need to accommodate other types of classifications.

In our system we have an average of 2.4 "linked" Taxonomy Terms per record, meaning that each service normally has 1-3 Taxonomy described categories, and those described categories are further made up of an average of 1.34 actual Taxonomy Terms (usually 1-2 per "link" which together make a compound Term).

For data sharing, it is tremendously useful to share not just the actual Terms, but the Codes/IDs, to buffer between Taxonomy versions and provide clarity about the specific Terms in use. I am less concerned about providing the actual Taxonomy/Thesaurus contents (which are often proprietary) vs. the extra detail within the record that would allow the recipient to identify the classification system(s) being used and not losing any detail in terms of compound Terms, IDs, etc.

The use of other classification systems (including custom ones) to supplement the Taxonomy actually outstrips the use of the AIRS Taxonomy Terms many times over in our systems, and even custom classifications are often shared to other systems. Sometimes multiple systems need to be distributed in the same export, and there needs to be a way to distinguish between these by including the Taxonomy/Thesaurus/Classification System name and possibly version as an optional part of the data.

Having a structured one-to-many relationship is critical to doing this effectively, IMO, because of the need to provide a) the Term's ID or code, b) the Term's current name or description, and c) the Classification system being used (named Taxonomy, Thesaurus, etc.). Without all those pieces, data sharing is challenging.

NeilMcKLogic commented 7 years ago

Tim, why did we not go with a normalized object with a one-to-many association to service? Are you going to do that in 1.1 ?

timgdavies commented 7 years ago

I think this is a good candidate for 1.1 - but I couldn't find a way to interpret the meaning of the existing docs such that service_taxonomy could be included as part of a bug-fix release.

NeilMcKLogic commented 7 years ago

OK. Do I need to do anything to put it into consideration for 1.1 ?

timgdavies commented 7 years ago

I think easiest is for me to move this to the 1.1 milestone and re-open it. Sorry - thought we had this tracked by another ticket.

timgdavies commented 7 years ago

For 1.1 I've: