Migrate across to resources schema

norrishd commented 3 years ago

I don't feel like there's any rush on this, but it's something that would certainly be valuable to do.

ronrest commented 3 years ago

I think this will be more than just a patch. I think this will lead to breaking changes in the structure of the outputs returned.

I think this could probably lead to a major version update. But, i think this provides a good opportunity for us to rethink a lot of the structure of the API and also the outputs. Especially since we are still technically in pre-alpha stage.

Some suggestions

API

renaming the limit argument in paginated queries. It is confusing, and could be confused with max_items.
- Perhaps renaming to something like page_limit? Something that makes it clear that it is the limit per page, and not the total limit of items retrieved.

OUTPUTS

Could I suggest we rethink the column names, they tend to be a little ugly. Some suggestions:

Use a different separator other than . when using json_normalize() for nested values.
Simplify/abstract away the nesting as much as possible.
- eg, instead of patient.user.fullName, replacing it with patient_fullname
- The user of the API probably does not need to know the full hierarchy of nested values that were needed to get the value.

norrishd commented 3 years ago

Yeah, we would for sure have to go to lengths to reshape the outputs to the current structure. I think this would be possible/manageable (especially if we migrated a few functions at a time), but I'm definitely open to the possibility of deciding that it's worth migrating to the new schema but not trying to preserve the exact current structure.

100% agree with you on doing away with limit. I'm still unsure if we should even be exposing it, or optimising and setting it ourselves within each function. Or maybe it could be a general param on whatever the resource schema equivalent of get_paginated_response() ends up being.

I'm on the fence with those suggested changes to the outputs - for anyone who is familiar with the schema, the current approach (using .s) makes it reliable to match a query up with the resulting dataframe, and means we don't need to have custom renaming in each function. It kind of boils down to this philosophical dilemma I feel is inherent to seer-py of "is this primarily a Python wrapper to a GQL endpoint, or a REST API on top of that endpoint?". If the answer is the former, IMO there are too many query functions defined as is/not enough organisation to them. If the answer is the latter, I think it doesn't currently abstract away enough, and is missing some obvious functionality, e.g. a simple download_study() method.

dkeden commented 3 years ago

If we have no intention of using seer-py with the exception of the execute_query method, then sure, there's no rush in migrating to the new resource schema. Otherwise, I would argue that it's something to consider sooner rather than later?

I would agree that this would be coupled with a restructure.

On another note, I think it's also worth considering who these changes would be for, e.g. external or internal users, etc., because the response would differ for each of these suggestions. A good point made by @norrishd re. Python wrapper vs REST API. Perhaps all for a larger discussion?

seermedical / seer-py

Migrate across to resources schema #137