openreferral / specification

The Human Services Data Specification - a data exchange format developed by the Open Referral Initiative
https://openreferral.org
Other
117 stars 49 forks source link

Support for custom fields in HSDS #165

Closed NeilMcKLogic closed 7 months ago

NeilMcKLogic commented 6 years ago

If you look at nearly all of the widely-used software systems for curating human services data, you will find they allow users to create and use custom fields. They are needed because the set of standard fields offered to a large user base are not sufficient to track and express local needs.

These can be free-text fields whose values don't belong anywhere else, or a defined list of options that can be used for filtering resource searching and other functional behaviors. They might help cluster a collection of resources in ways to reflect an affinity for some locale, target population or special public or private funding directed at specific problems or places. And they often include acronyms or names that are known in that region but which would result in blank stares if you're not "from there".

Some examples to help spark your imagination:

-"First Five" early childhood education -"Re-entry programs" for criminal offenders -Indications that a service will be provided even for undocumented immigrants

It would make the most sense to link custom fields to the Service record type, but there are probably justifiable reasons to also link to Organizations and Locations too.

HSDS should express some method of supporting these custom fields and in fact we have active use cases now that will require us to implement something probably before the standards body can reach consensus to enhance the spec.

@timgdavies , what is the process to move this forward?

greggish commented 6 years ago

Good question, Neil. @timgdavies is out on vacation for the week, but I am bringing this issue to some advisors to see if we can find examples of this kind of thing in other standards work.

@kinlane, I know this is stepping back from protocol to the schema, but: any examples come to mind that we can reference?

greggish commented 6 years ago

Seems like a standard convention is to add an X- at the start of non-standard fields.

One way or another, I'm advised that our specification should stipulate that a transaction won't fail just because an extended field is not available on either the client or server side of the exchange.

greggish commented 6 years ago

Also, this may apply more at the API spec level: the Open311 spec has these instructions for getting a set of custom fields.

greggish commented 6 years ago

@jpmckinney pointed me to an alternative approach taken by the Open Contracting Data Standard, where they don't use X- or any prefix, but rather have a demonstrated method of registering extensions.

I'm inclined toward following OCDS's lead here. @timgdavies is active in both projects, so I look forward to him weighing in when he's back from vacation.

Noah-T commented 6 years ago

@NeilMcKechnie This is an excellent point. For example, we have a partner that is running a pilot program at a few of their agencies. We have a business need to be able to query for and quickly identify the agencies that are running the pilot program.

timgdavies commented 6 years ago

So, there are a couple of issues here:

(A) How should systems handle properties not in the schema, or that they do not recognise?

(B) How should users come to understand these additional properties and what they mean?

(C) How could/should we represent this in schema terms?

(D) How do we ensure the meaning of additional fields is consistent across implementations?

Considerations

Conformance and namespacing

@jpmckinney's conformance statement from PopoloProject is useful here which states:

  1. A conforming implementation may use only a subset of this specification’s terms.
  2. It must not use terms from outside this specification’s terms where this specification’s terms would suffice.
  3. It may use terms from outside this specification’s terms where this specification’s terms are insufficient.
  4. Its usage of this specification’s terms must be consistent with the semantics of those terms.
  5. If an implementation serializes to JSON, its serializations must validate against this specification’s JSON Schema.

(3) is saying that you can use other terms, ideally drawing on other existing vocabularies.

(4) is saying that you must follow the definitions provided by those other vocabularies.

(5) doesn't strictly apply to us... as we're not using JSON Schema, and I'm not sure we have determinate model of validation against the Data Package Spec. But a version of this could be developed for HSDS.

If we introduced a version of this conformance statement to HSDS then the implication would be:

Documenting additional fields

In terms of representing to users what these new fields mean (and allowing validation of the updated date), when HSDS is published along with it's own datapackage.json, the publisher can update the datapackage.json to include a definition of the additional fields they are including, and the vocabulary etc. they have drawn them from.

How such additional fields might be represented in an API spec will depend on the hypermedia formats chosen for API design, which I think is still tbc (cc @kinlane)

Distributed extension / extension clashes

Allowing non-namespaced extensions raises the risk that two publishers of data may both add a new field with the same name, but different semantics.

For example, two publishers might both add 'immigationStatus' but each using different semantics for this field (one making it a boolean field to mean whether the service considers immigration status or not, the other making it an enum list of potential eligible statuses for support).

This is where in OCDS we have created the extensions registry and a full mechanism for declaring the extensions that are in use (and we're working towards validation of extensions that should address clashes etc.) - although the most important part of this is around simply getting users to 'declare' then they are planning to use extend fields via GitHub, such that we can get groups with similar needs to discuss the best way to model the additional data.

Right now, building the technical infrastructure for this in HSDS does not seem feasible, but we could perhaps develop a social convention in the meantime of:

For example (made up examples, comments on realism to test these ideas welcome):

Custom namespace

The 'QuickRefer' system supports direct entered, and aggregated listings, and wants to communicate to a particular user where listings came from.

As this is a specific data artefact from their data, they choose to include it in a quickRefer_listingType field.

By using this namespace, they are communicating that this is a system specific field, and they are the ones defining the field.

Non-namespaced extended field

QuickRefer also records data on the 'members' of an organisation (to capture details of named professionals in a support practice).

They find no terms in HSDS to represent membership, but identify that the definition used in the Organization ontology meets their needs, and so they open an issue to propose 'memberships' is recognised as an extended field of HSDS.

Based on feedback, they clarify the definition of this field, and start using it. They add this field to their datapackage.json file, including a link to the GitHub issue discussing it within the description.

Conclusions

robredpath commented 4 years ago

@MikeThacker1 is this something you'd be able to reflect on wrt your work in England?

MikeThacker1 commented 4 years ago

In England so far we've annotated a copy of the Tabular Data Format datapackage to denote what is an extension to Open Referral and what, from the extended schema, is proposed as an "application profile" defining the classes/properties (or "tables/fields") that apply in our broad use-case. The tabular data package could be further annotated to denote the version of the standard in which something was introduced or deprecated.

We autogenerate the JSON schema based on what's in the application profile. Our validation against the JSON schema considers something valid if the required classes/properties are there and in the right structure. We don't consider something invalid if it is additional to the schema.

So:

devinbalkind commented 4 years ago

For the sake of simplicity and usability on Airtable, we've replaces the following tables with a custom data table:

Our custom data table has the following fields:

robredpath commented 4 years ago

I'm bringing this into scope for the Spring 2020 upgrade as I'm hoping that we can agree on a mechanism by which we can describe and share the additional information that we store in systems, so that we can see each others' work, find commonalities, and hopefully help other people who are working on similar problems.

robredpath commented 3 years ago

This has started in a very informal way in http://docs.openreferral.org/en/latest/hsds/extending/

I've steered clear of suggesting technical mechanisms, because I think that it's more important for people to share their work in some way (as we've seen in this issue!) than to share in the 'right' way.

@greggish could we create a place where people's variances, solutions to problems, etc can be referred to? The page I linked to from the docs is non-normative so that could be it, or it could be as simple as a Google doc (or an AirTable!)

mrshll1001 commented 7 months ago

I am tentatively closing this since: