Web: json api - Githubissues

will-moore commented 8 years ago

First goal it to take existing api urls from webclient, cleanup the api and port them to webgateway. Some of that work was started at https://github.com/openmicroscopy/openmicroscopy/pull/4357 but more discussion was needed (and this is for 5.3.0 so we don't want to open a PR yet).

I've tried to document the existing api urls and suggest a cleanup in a gist, so that we can have a single document that we're editing until happy: https://gist.github.com/will-moore/2aa7d5eef6481dd3c777

Comments on the gist describe cleanup changes that have been made so far (and you can also inspect the gist versions).

Questions to discuss:

[ ] Versioning: In url webgateway/api/v1/ E.g. here or in header (E.g. github
[x] Should filesetId: 1 be fileset: {id: 1} same as owner and group?
[x] permissions field. E.g. "canEdit canAnnotate canLink canDelete"
[x] /containers/ to use data:{} wrapper too?
[x] /publicContainers/ for shares:[], discussions:[] and /shares/ to list shares only
[ ] Update parameter matrix for lists. New one for create * parameters.
[ ] embed options (see below)
[x] camelCase vv snake_case for everything (urls and parameters)?
[ ] sorting? (see below)
[ ] /images/:id to return current imgData json? Should /images/ list use similar format?
[ ] plateAcquisitions vv acquisitions vv runs

joshmoore commented 8 years ago

Probably best to remove this ambiguity and use api/images?dataset=1 instead.

NB: this might be an interesting use case for https://github.com/openmicroscopy/openmicroscopy/pull/4519 -- cc @mtbc

mtbc commented 8 years ago

Nice gist document! I'm happy to translate example English queries into what the proposed request parameters would be, if that helps. Though, for things like datasets-of-image HQL will always be fastest; the requests are better for things like folder hierarchies where the HQL is hard/impossible to write.

(I don't want to cause a large side-discussion on this PR but one awkward wrinkle in my other requests, that stopBefore offers some control over, is recursive issues like finding both the lightsources and the lightsources that are the pumps for the lasers in those first ones, or finding images in a dataset that are actually masks on the images. In short: for queries where the path is a fixed length over known objects and you don't want any others, HQL remains better.)

will-moore commented 8 years ago

Updated gist to replace id parameter with correctly-named paramerts: E.g. replace api/datasets?id=1 with api/datasets?project=1

I am pretty happy with the API in the current gist. Just waiting on more comments & discussion before I start to implement these changes and move to webgateway.

joshmoore commented 8 years ago

General query methods look good (though the plate_acquisitions name jars the eye but we may be stuc k with it).

In terms of the params, I'd probably keep owner/group symmetrical for plate_acquisitions. It's not incorrect for someone to add a run to your plate in a rwrw group even though it's unlikely.

Barring discussions on future methods (write methods, pixel loading, etc), my primary concern would be to what extent we want to scope out the error codes & return values in the gist before you start on the implementation. E.g. are all outputs in omero-marshal format? Do we start with a media type from the start so we can expand in the future? Do we have any rate limiting errors?

will-moore commented 8 years ago

@joshmoore Happy to rename plate_acquisitions to plate_runs or even runs, since we use that terminology in the clients.

will-moore commented 8 years ago

@joshmoore Oops - I missed the id (plate ID) for plate_acquisitions and I also need to indicate on that chart which params are required/optional. For plate_acquisitions, plate ID is currently required, so the group would be defined by that. But I guess it could come into line with other urls, allowing you to list runs across a whole group/owner? I think renaming plate_acquisitions to runs would make sense, since we already use that term in the clients.

Error codes could do with some more scoping; As mentioned on my original PR https://github.com/openmicroscopy/openmicroscopy/pull/4357 We handle errors in parsing query parameters such that webclient/api/containers/?page=one returns HttpResponseBadRequest("Invalid value 'ome' for parameter 'page'"). Unhandled errors will give a 500 response. I'll add some more details to the gist.

I would suggest that our default return media type is json (same as github API), except for urls that render images (none of these are currently under the api prefix). We could then add support for other media types if that becomes required later. We don't use omero-marshal so the format is not the same and is rather customised for webclient needs. I will also add return format to the gist. I have no idea about how we'd do any rate limiting or what our policy would be. We'd have to keep track on the server somehow?

joshmoore commented 8 years ago

For plate_acquisitions, plate ID is currently required, so the group would be defined by that.

Will this have any impact for wanting to provide a link to a run?

(Other items to be followed up post gist update)

jburel commented 8 years ago

for Image:

acqDate better to be more explicit i.e. acquisitionDate
date, that will be the imported date. maybe again better to be clear since "dates" on images have always raised questions

jburel commented 8 years ago

owner: filter objects by owner

In your table, how do you specify the owner of a link not owner of an object?. We have cases where ownership of the object e.g. image, is not important, the ownership of the link is.

atarkowska commented 8 years ago

The biggest remaining issue is the id parameter on a bunch of urls, which is a different identifier on each. E.g. api/datasets?id=1 filters by project ID whereas api/images?id=1 filters by dataset ID. Probably best to remove this ambiguity and use api/images?dataset=1 instead.

Why this is a problem? I found that very useful when writing custom mapannotiation app to display genes. ID should be a generic PARENT ID as it is now, rather then enforcing explicit type. With the small adjustment https://github.com/aleksandra-tarkowska/openmicroscopy/commit/2dae5639525b69b8a677cac21183654c57d9e88c I can even now pass more params as an extra to the QUERY_STRING. That works great. Why just not go this way?

Second in this particular example, I no longer pass object ID as it simply doesn't exist, but mapannotation key as a string and or parent id. If you need explicit type just make sure json returns it.

I will be backporting most of the commits from https://github.com/openmicroscopy/openmicroscopy/compare/develop...aleksandra-tarkowska:mapannotations?expand=1

mapannotation app is in https://github.com/aleksandra-tarkowska/mapannotation, overriding tree was simple https://github.com/aleksandra-tarkowska/mapannotation/blob/mapannotation/static/mapannotations/javascript/mapannotations.tree.js, I will add few more suggestions maybe in another issue, like adding new types...

atarkowska commented 8 years ago

The biggest problem for me is lack of pagination on each levels. ?page=0 is in place but used for datasets. With the large amount of data, like tags, tree won't load. I have started addressing this issue in https://github.com/aleksandra-tarkowska/openmicroscopy/commit/96bd9cc47b30915d83963ad00dc3678e67dbb0b2. I would consider ?page=1 as a default in API

will-moore commented 8 years ago

Will this have any impact for wanting to provide a link to a run?

@joshmoore I don't think so. None of the urls here currently have a single object ID since they are all for listing objects.

will-moore commented 8 years ago

date, that will be the imported date. maybe again better to be clear since "dates" on images have always raised questions

@jburel OK, I will use creationDate (not importDate)? This would best reflect the underlying data and can be applied to other objects in future.

will-moore commented 8 years ago

In your table, how do you specify the owner of a link not owner of an object?

@jburel We don't currently support that. As we move more functionality from current webclient html requests into webgateway json API, we can plan and discuss the api for each.

will-moore commented 8 years ago

Why this is a problem? I found that very useful when writing custom mapannotiation app to display genes. ID should be a generic PARENT ID as it is now

@aleksandra-tarkowska Currently in the jstree code that uses "id" as the universal parent, the containers url uses id to mean experimenter_id since experimenter is the parent in the tree. So we cannot choose the API purely based on convenience for jstree code since this won't make sense elsewhere. Also, users might get confused by the difference between webgateway/datasets/1 and webgateway/datasets/?id=1. In the first url, the ID is dataset ID (to get details of a single dataset) and in the second case this is actually Project ID.

In future we may want to allow webgateway/datasets/?tag=1 to show datasets tagged with Tag:1, whereas to show tags tagged with Tag:1 the url would be webgateway/tags/?id=1.

Another case that might lead to confusion in future would be listing Wells, where parent could be Plate or Run.

jburel commented 8 years ago

@will-moore: cf. https://github.com/openmicroscopy/design/issues/24#issuecomment-197254985 One of the goal of that effort is to define/design the new API not just moving methods around and possibly renaming them.

will-moore commented 8 years ago

I would consider ?page=1 as a default in API

@aleksandra-tarkowska This is already the case for the API and is supported across most of the urls already. If we start using this in jsTree for Projects, Datasets etc, we will need to think about where to put the pagination controls and what they look like etc. In the top level of the tree, you might need pagination only for Projects and Datasets but not Screens. Do we need Prev/Next buttons for each container type, or does "Prev/Next" buttons move through P/D/S pages all together? In the Tag tree this is even more complex, since you will have a page for each of the tagged objects P/D/I/S/P.

sbesson commented 8 years ago

Also :+1: for acquisitionDate and generally sticking to the model attribute scheme (as we working towards fixing model/DB discrepancies independently). For import dates, we have not existing concept, my slight preference goes to importDate (also this is the name used in the UI).

Also as this is work targetted for OMERO 5.3.0, should we also start including folders in this discussion?

atarkowska commented 8 years ago

just discussed with @will-moore as the urls proposal https://gist.github.com/will-moore/2aa7d5eef6481dd3c777#urls-for-listing-objects-matrix-of-query-paramters had only one parent and that wasn't clear what is the intention of changing id.

Based on recent experience with mapannotation from my perspective each object should consider all type of parents that are above in our model (maybe even going both ways?), e.g. image may have dataset_id, project_id, run_id, plate_id, screen_id, experimenter_id, group_id. Curerntly webclient consume only one parent based on the model. As long as they won't be hard coded I am ok with that. I am also guessing there may be cases where people want to display hierarchy up side down.

webclient/api/images/?project_id=123 could display all images belonging to datasets belonging to project 123
experimenter_Id and group_id could be additional filter as well

all of them may not need to be implemented in the first place, but this should be allowed by custom views because api is closely tight to the jstree I mostly look at that from current JS files perspective. With the current state of JS files I see that a blocker

will-moore commented 8 years ago

@sbesson If we are "sticking to the model attribute scheme" then we should use creationDate instead of importDate.

We could certainly consider a folder parameter on most of these urls, but I don't know if we'd get that implemented for 5.3.0.

atarkowska commented 8 years ago

I would consider ?page=1 as a default in API

@aleksandra-tarkowska This is already the case for the API and is supported across most of the urls already. If we start using this in jsTree for Projects, Datasets etc, we will need to think about where to put the pagination controls and what they look like etc. In the top level of the tree, you might need pagination only for Projects and Datasets but not Screens. Do we need Prev/Next buttons for each container type, or does "Prev/Next" buttons move through P/D/S pages all together? In the Tag tree this is even more complex, since you will have a page for each of the tagged objects P/D/I/S/P.

It is true views has page=1. Sorry I referred to jstree that is not the case here. We also discussed that loading container hierarchy is a tricky to use pagination as these are separate queries for each type of containers

Maybe hql experts could advice if we could get single query to load mix of IObjects sorted by name with limit and offset?

atarkowska commented 8 years ago

I updated my comment in https://github.com/openmicroscopy/design/issues/24#issuecomment-197295657

joshmoore commented 8 years ago

Mentioned to @aleksandra-tarkowska in jabber that I don't think HQL can do this. There's some basic UNION support and some subquery support and some order/limit support, but not in the combination we need do a nice mixed-paged list. One thing that might work is to add an "ContainerType" superinterface to each container that should be queried at the same time. What will work is to drop down directly to SQL.

will-moore commented 8 years ago

Reading http://www.vinaysahni.com/best-practices-for-a-pragmatic-restful-api...

We should probably support projects/10/datasets/ for listing Datasets in a Project, because this is a very common format and would be expected by API users. This would be equivalent of datasets/?project=10.
Versioning: Do we include an API version number (independent of OMERO version number)? In the url itself webgateway/api/v1/projects/? Or in request header? Hopefully this API will be more stable than OMERO version numbers (as the existing webgateway json api has been).
Sorting: could consider a sort parameter projects/?sort=name although we already sort on name by default. Only other field we might want to sort on is creationDate or acquisitionDate for images.
We should use the format owner:{id: 123} instead of ownerId:123 to nest other objects, in case we want to include extra info later, such as name etc.
Maybe returned lists should be in the form of { data: [ ] } instead of { projects: [ ] } from /projects/ etc. Or even without the extra wrapping [...] (like github). However, I have found it useful to be able to include extra info, such as totalCount, E.g. { data: [ ], totalCount: 1234 }.Need to investigate this more.
We might want to support some embed options, maybe just on single items (not lists) E.g. project/10/?embed=owner, group.id would include

{
    owner: {firstName: "Will", lastName: "Moore", id:1},
    group: {id:2},
}

will-moore commented 8 years ago

@joshmoore @chris-allan I've done a fair bit of work on the json API https://gist.github.com/will-moore/2aa7d5eef6481dd3c777 and it's ready for a good review. And there's a fair list of points for discussion at the top of this design issue.

sbesson commented 8 years ago

New version of the gist looks good. A few comments added to the gist itself. On top of all items to be discussed, one of my main concerns is the renaming of plate_acquisitions to runs which could benefit from a larger discussion. Although it matches the client usage, this is only true in the context of the plate i.e. GET /plate/:plateId/runs/. On the other hand, GET /runs/ sounds too generic to me and interestingly the title of the gist section drops this ambiguity by prefixing it with List Plate Runs. Also the current proposal introduces another discrepancy between the API and the model. Are we effectively committing ourselves to renaming PlateAcquisition as PlateRun in future model changes?

atarkowska commented 8 years ago

what about /api/search ?

atarkowska commented 8 years ago

@will-moore that looks really nice. My only comment is that creation should be handled by PUT and update should be handled by POST, no?

atarkowska commented 8 years ago

one suggestion, could you add table showing HTTP responses for each?

joshmoore commented 8 years ago

Looking quite nice. Some questions reading from the top:

For the URL, $server/api/v3 and drop webgateway?
Error handling: perhaps next the security issues? Others I can think of are timeouts.
For the methods below, are the listed parameters all "optional" if "required" is not listed?
List groups parameters: owner in addition to member?
List experiments parameters: active, admin?
Get a single experimenter: plural in URL is possibly confusing.
Create a Project: the plural here is also odd.
List Images: additional orphaned flag is slightly duplicated with dataset/well (unlisted)
Deletes: might be good to return a count of deleted items to check for existence.

Probably highest level question from me is to the extent we can agree on omero-marshal now.

update should be handled by POST

Is it worth thinking about PATCH at all?

will-moore commented 8 years ago

@aleksandra-tarkowska I think that POST should handle creation of new objects, E.g. POST /projects/ and PUT is to update a specific item. E.g. github create an issue is POST: https://developer.github.com/v3/issues/#create-an-issue See also: http://blog.mwaysolutions.com/2014/06/05/10-best-practices-for-better-restful-api/ Discussion: http://stackoverflow.com/questions/630453/put-vs-post-in-rest And spec: https://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html#sec9.5

will-moore commented 8 years ago

@aleksandra-tarkowska api/search is certainly something to think about, but I think we need to start that discussion in a new gist & issue since we're already getting a bit too much with the scope we already have here.

will-moore commented 8 years ago

@joshmoore

I think a new api app at <server>/api/v1/ sounds like a good idea. Removing webgateway can happen independently.

Error handling: perhaps next the security issues? Others I can think of are timeouts

You mean add security issues (authentication / handling not-logged in?) and timeouts to the error handling section?

All parameters are optional for GET urls so far. Required parameters for POSTs are marked.

All the advice I've read is to use plurals even for listing single items. E.g. here and this page is https://github.com/openmicroscopy/design/issues/24

List images is all images by default. dataset and orphaned=true parameters are mutually exclusive, but if you don't have dataset parameter, that doesn't mean that you're just listing orphaned images. In practise you usually want to filter by dataset or orphaned but this is not the default behaviour. We could make /images/ with no parameters use orphaned=true by default, but filters are generally off by default and you probably wouldn't expect orphaned=false to list ALL images.

Deletes: You're suggesting that if I ask to delete 5 links, we should return 4 if only 4 of them were found? Maybe we'd need more info than that to be useful? E.g. which link wasn't found? Actually, we're currently returning remaining parent links for the children of deleted links - so that the webclient knows if they've become orphaned in the tree. Maybe that should only happen with a returnLinks=trueparameter.

I guess PATCH is probably better for updating since it doesn't require a complete replacement.

manics commented 8 years ago

However, I have found it useful to be able to include extra info, such as totalCount, E.g. { data: [ ], totalCount: 1234 }

This would be useful for knowing whether pagination is required or not.

atarkowska commented 8 years ago

@will-moore regarding previous https://github.com/openmicroscopy/design/issues/24#issuecomment-197225567, how this could handle non standard hierarchy? What about if object simply won't have ID but alternative type of unique filter, like string or a list of params? My concerns are about the technology we use that do not allow simply overriding url patterns. What is your suggestion?

will-moore commented 8 years ago

@aleksandra-tarkowska Sorry, which previous comment are you referring to?

atarkowska commented 8 years ago

@will-moore APIs with nested routes codify relationships we should not be confident of. Are we happy of making an assumption that child object will always belong to the single parent object? (our model doesn't state that)

GET     /project/:projectId/datasets/

To be fair that should be:

GET     /group/id/experimenter/id/project/id/datasets

I am much happier with just /datasets/?group_id&experimenter_id... https://github.com/openmicroscopy/design/issues/24#issuecomment-197295657 As it is much more flexible, easy to extend and do not limit us in the future. It basically simplify multiple API endpoints for the same type of objects

Also for a reverse hierarchy you would have to have

GET      /dataset/id/projects/

will-moore commented 8 years ago

@aleksandra-tarkowska I think if you are listing children of a specific object then you don't care about the "path" or parents of that object. There's no need to include the full path. See minimize-path-nesting.

I realise that GET /projects/:projectId/datasets/ could also be provided by GET /datasets/?project=:projectId but it seems such a common pattern among all json APIs that users would simply expect /projects/:projectId/datasets/ to be available. Providing a convenient url for a common query is what's suggested by "Aliases for common queries" here.

I haven't included listing of parent objects yet, but GET /datasets/:datasetId/projects/ could certainly make sense, since GET /datasets/:datasetId/ will give details on a single Dataset and GET /datasets/:datasetId/images/ will list it's children.

atarkowska commented 8 years ago

@will-moore but that doesn't answer my previous question, how to handle custom hierarchy?

joshmoore commented 8 years ago

@will-moore but that doesn't answer my previous question, how to handle custom hierarchy?

https://github.com/openmicroscopy/openmicroscopy/pull/4531 might allow some "path shortening".

atarkowska commented 8 years ago

One more thing that may be handled in a separate gist is functions, for example: /password (this assumes the user is logged in) vs /experimenters/:id/password (this allow admin to change password)

atarkowska commented 8 years ago

@will-moore are we going to support custom list of fields in a response, or that will be hardcoded?

/projects/?fields=id,name,permissions...

atarkowska commented 8 years ago

Maybe returned lists should be in the form of { data: [ ] } instead of { projects: [ ] } from /projects/ etc. Or even without the extra wrapping [...](like github). However, I have found it useful to be able to include extra info, such as totalCount, E.g. { data: [ ], totalCount: 1234 }.Need to investigate this more.

This is called envelope and very useful

{
  "data": [ ... ], 
  "paging": {
    "next": "https://example.local/web/api/v1.0/projects/?page=2"
  }, 
  "summary": {
    "total_count": 12345
  }
}

will-moore commented 8 years ago

@aleksandra-tarkowska Probably won't support /projects/?fields=id,name,permissions initially, since this is usually for restricting the number of fields returned and most of the objects don't have many fields so it's not a high priority. But certainly something to keep in mind.

will-moore commented 8 years ago

@joshmoore @chris-allan If we're planning a new "api" web app, should this go in a new github repo or just add it right into OmeroWeb code? (I think I know what you're going to say!). More repos ftw! A bigger question: Do we try to support multiple versions at once? E.g. api/v1/projects/ AND api/v3/projects/? I think this is going to be very tricky to do, since v1 will have to be maintained forever, even with newer versions of OMERO, and we will have to duplicate all the code for each new version, so that we can work on v3 without touching v1 and v2. If we don't support multiple versions, then do we need the version number in the url, or do we just return the version number in the header or provide a /version/ url that returns it?

mtbc commented 8 years ago

Not sure v1 would thus have to be maintained forever?

will-moore commented 8 years ago

Even if we only support two versions at a time, E.g. v1 & v2 then, v2 & v3 we will still have to have fully duplicated code in the same app. I wonder how feasible it would be for users to install multiple different versions of the same app, IF they want to use multiple versions. I can imagine lots of PYTHONPATH issues!

atarkowska commented 8 years ago

@will-moore this is what virtualenv is for

will-moore commented 8 years ago

Trying to make the query parameters and response for /images/ less customised: Instead of sizeX, sizeY, sizeZ fields on the image itself, it would make more sense for them to be in their own pixels object, which is optionally included with a ?embed=pixels parameter. BUT, then we should probably include a bunch of other pixels fields by default E.g. physicalSizes (and units) etc.

  pixels: {
    sizeX: 512,
    sizeY: 512,
    sizeZ: 29,
    sizeC: 3,
    sizeT: 1,
    physicalSizeX: {value: 0.15, unit: MICROMETER, symbol: uM},
    physicalSizeY: {value: 0.15, unit: MICROMETER, symbol: uM},
    physicalSizeZ: {value: 0.01, unit: MICROMETER, symbol: uM},
  },

So if you want only sizeX and sizeY you'd do ?embed=pixels.sizeX,pixels.sizeY. Of course, coding and testing all this is a significant bit of work, but it would at least make this a bit more "future-proof" than the current /images/?sizeXYZ=true.

will-moore commented 8 years ago

@aleksandra-tarkowska I've added a data: {} envelope to all responses.

ome / design

Web: json api #24