openreferral / api-specification

This is the working repository for Open Referral's Human Services Data API protocols.
https://openreferral.readthedocs.io/en/latest/hsda/
Other
29 stars 13 forks source link

/complete #45

Closed kinlane closed 3 years ago

kinlane commented 7 years ago

I didn't quite see any feedback regarding the options on the table for allowing for a more comprehensive approach to schema filtering, so I'm just going to go with the most intuitive option, and add an /everything path to /organizations, /locations, /contacts, and /services.

Ie. when you do /organizations/ you get a simple, flat schema. when you do /organizations/everything you get a complete schema with all subresources.

We can revisit header options for this, and other approaches further down the road, this should support the concerns on the table.

NeilMcKLogic commented 7 years ago

I assume you also mean /organizations/everything would give you a list of ALL Organizations and ALL of their subresources? If so then you probably want to add paging to this method, in the spec, as smart implementers would certainly want to prevent a request from slamming their database in the case of large datasets.

greggish commented 7 years ago

I think yesterday @klambacher told me that pagination would be a serious hindrance when requesting ALL. Am i mixing issues here?

klambacher commented 7 years ago

My concern is that pagination is very high risk in data synchronization activities. Accuracy relies on the data gathering activity being an atomic operation, with the records in the set remaining unchanged while the operation occurs. Often pagination is implemented through a requery which can result in subsequent pages not being an exact continuation of the previous request. It's not that it can't be managed, but I would be unlikely to trust a 3rd parties pagination implementation for a bulk operation without understanding implemention details. My preference for bulk data sync is to have a low-data query (ids and types) that identifies ALL records to collect via search, establishing an set to operate on. Then the full records can be fetched all at once or in batches safely.

NeilMcKLogic commented 7 years ago

@klambacher yep that is a reasonable approach. Then there's also the topic about bulk transfers, for consideration: https://github.com/openreferral/api-specification/issues/58

kinlane commented 7 years ago

Couple of concepts in play here:

  1. Default pagination - I will be keeping the existing page, and per_page as simple default across not just core elements, but yes @NeilMcKechnie this /everything /complete /all (still not sure which I'm going with).
  2. HAL / JSON API (Hypermedia) - Really bring home sensible pagination, linking, and assisting.
  3. Bulk Transfers - This is precisely why I've broken out bulk transfer needs into separate concern--system integration needs shouldn't trump application level integration needs.

In the end, adding a /everything /complete /all for all core objects satisfies needs for a complete representation of an resource and sub-resource, with basic level pagination. There will be a GET, POST at this level as well, but should not be used for volume, or bulk loading -- just app level integrations.

I have opted for an API design /everything /complete /all, over a ?scope=all, for caching and performance. When it is a path, caching, and performance becomes much more of reality at the webserver level, where dynamic queries are pulled each time. Assisting operators with their performance concerns. Which compliments the separation of system bulk loading concerns.

At this point I'm, as I'm going with /complete, so:

Keep core /[resource] 100% reflecting HSDS, and /[resource]/complete reflecting top level, and sub-level resource. I leave to HSDS data team to accept new schema back into master or not.

timgdavies commented 7 years ago

Thanks @kinlane

Two thoughts:

(1) I have a preference for /full or /detailed over /complete as 'complete' has certain connotations about the underlying data, as well as the structure of the response.

I.e. it's possible to have a 'full' response that is not necessarily 'complete' with everything that is known about a service or location.

(2) Is the choice of sub-level resources programatically derived from the foreign-key relationships in HSDS, or have you had to make editorial choices here?

If there are updates we should put back into HSDS to enable a good Single Source of Truth approach to these major and sub-resource relationships, happy to look at those.

kinlane commented 7 years ago

Thanks for feedback.

1) I have no strong opinion on the /complete, and happy to switch /full. I have been waiting for any comment on this since July. Will be adjusting unless I hear other feedback.

2) Can you please reference where you see the HSDA schema reflect this. I have been strict about maintaining HSDS as central truth (except for nesting), so if there is a place it is a typo in definition. No editorial taken.

rasmus-storjohann-PG commented 6 years ago

@kinlane can you outline (or point me to) the reasoning behind what information is included and what is not in the default (i.e. not /complete) representation of resources? One concern I have seen related to this is that no two clients of the API will agree on what is important and not important in the response, so the /complete end point will end up being used most of the time. I have proposed in https://github.com/pg-irc/pathways-backend/issues/163 a scheme where the client can specify which fields to include and which to embed. I'm not sure that is a great way to go either, but it's an alternative approach. I can see that it will not play as well with caching as the /complete approach.

kinlane commented 6 years ago

The reasoning was to keep the default reflecting HSDA, and keeping flat so CSV could be negotiated -- lowest bar, directly to spreadsheet. Then provide everything. Beyond that, no consensus on other views, or approach to allow for schema filtering, and with so much on table for that version it was pushed to the future for further discussion. Caching is definitely one of the concerns. Open to feedback, and continued discussion for whether should be in v1.3. Thanks!

kinlane commented 3 years ago

Complete will go away in v2.0 in favor of the addition of resources property that allows user to choose if they want any sub-resources returned, with the default being no sub-resources.