gramps-project / gramps-web-api

A RESTful web API for Gramps
GNU Affero General Public License v3.0
77 stars 40 forks source link

Filter list-like endpoints with URL arguments #17

Closed DavidMStraub closed 3 years ago

DavidMStraub commented 3 years ago

For instance, /api/people/?tag=TODO or /api/events/?participant=I0001

cdhorn commented 3 years ago

Pulled this out of title for the PR as the code for the Gramps filter framework was reverted.

I'd like to return to this soon as that seemed to be working nicely for a first pass.

cdhorn commented 3 years ago

@DavidMStraub @Nick-Hall So I have a slightly cleaner pass at this doing what I had done before. To summarize it at a high level:

The rules query parm for a major object returns a list of the available built in filter rules for that object type. An example rule entry under person would be:

{"description": "Matches people with the particular tag", "labels": ["Tag:"], "name": "People with the ", "rule": "HasTag"}

The logic= keyword allows selection from the supported list of ["and", "or", "xor", "one"] The invert keyword inverts the results. The filter= keyword then takes a dictionary as a value, with the keyword being the rule and the value being the list of arguments for it. For example:

filter={"HasTag":["DNAMatch"],"IsMale":[],"PeoplePrivate":[]}

I think this is a decent solution and preferable over introducing a whole slew of query parms for all the different things to filter on and cluttering the namespace. The person class alone has 79 rules at the moment.

Note this was based on examination of the sidebar code. I certainly don't understand the filtering code very well as I did not dive in deeply, it seems like there is more functionality I am overlooking. Is that the case Nick? Can you point me where to look if so to get a better feel for it or does this cover most of it?

Note two other changes kind of naturally fell into place when I did this.

First I did refactor the object_extend as part of this to remove duplicate code from many of the major gramps handler classes. GrampsObjectResourceHelper now handles the default case.

Second I converted all of the boolean query parms like profile, extend, and strip that used to require the =1 to be tacked on into string types so you can just use "profile", "extend", and "strip" by themselves instead. I feel that is cleaner.

Let me know if this all sounds acceptable and I'll update the apispec.yaml for the parm changes/additions and submit a pull request.

Nick-Hall commented 3 years ago

@cdhorn It appears that you understand the basics, however there are filter rules that match other filters. It may be better to create a rule first and then apply it in a separate query.

Custom filters are stored in an XML file. Have a look at the parser and writer code for details.

Here is an example of a single filter with one rule:

<?xml version="1.0" encoding="utf-8"?>
<filters>
  <object type="Person">
    <filter name="Attr" function="and" invert="1" comment="Not a baker">
      <rule class="HasAttribute" use_regex="False">
        <arg value="Occupation"/>
        <arg value="Baker"/>
      </rule>
    </filter>
  </object>
</filters>

Perhaps we could use the http PUT method with an endpoint such as `api/person/filters/Attr' to define a named filter with a JSON representation of the filter similar to the XML?

Then it could be queried with /api/people/?filter=Attr.

Named filters would allow one filter to reference another.

cdhorn commented 3 years ago

Thanks @Nick-Hall boy did I overlook an awful lot. Using /api/{gramps_object}/filters would not be appropriate as that position in the path is for the handle.

I am refactoring things and now have a pass with a /api/filter/{gramps_object} endpoint. Get returns the list of custom filters and available rules, post adds or updates a custom filter and delete will delete a custom filter once I sort through that.

For /api/{gramps_object} there are then two query parms: filter and filters filter is as you suggested Nick and accepts the name of the custom filter to apply. filters works in a similar manner to what it did before building and applying one on the fly but with a slightly extended structure as follows allowing me to ditch the logic and invert query parms and adding support for the regex flag:

{    
     "n": name,            # Filter name
     "c": comment,      # Filter comment                                                                                                       
     "f": function,         # Logical operation 'and', 'or', 'xor', or 'one', default 'and'                                     
     "i": invert,             # Boolean indicator to invert result set, default False                               
     "r": [                     # List of rules to apply as part of the filter                                        
         {                                                                                                         
             "n": name,   # Class name of the rule                                                              
             "v": [],          # Values for the rule, default empty list                                             
             "r": regex     # Boolean indicator to treat as regex, default False                                  
         }                                                                                                         
     ]                                                                                                             
 }  

The same structure is used for the post but instead of shorthand keys normal words are used.

I see SearchFilter and ParamFilter and need to look closer at where and how those are used and how to handle them.

@DavidMStraub a question about put and post. I have worked with some REST API where put always creates and post always updates, others where put creates and post can both create and update, and others where there is no put and post both creates and updates. I'm not sure if there is a formal/proper way it should be done. What do you prefer from a design standpoint?

Does the overall approach I describe here sound okay? Suggestions for changes?

DavidMStraub commented 3 years ago

@DavidMStraub a question about put and post. I have worked with some REST API where put always creates and post always updates, others where put creates and post can both create and update, and others where there is no put and post both creates and updates. I'm not sure if there is a formal/proper way it should be done. What do you prefer from a design standpoint?

Good question, honestly I don't have a lot of experience with this, but this https://restfulapi.net/rest-put-vs-post/ makes sense to me, so we could PUT to /api/person/<handle> to update and POST to /api/person to create. For a PUT with a non-existent handle, I'd say we fail rather than create, since handles are supposed to be autogenerated.

cdhorn commented 3 years ago

Good resource, I should have done a little research like that before posing the question. Did you read https://restfulapi.net/resource-naming/ and https://restfulapi.net/hateoas/ which both seem to suggest the approach we are taking by trying to expose the Gramps schema as it is maybe merits some rethinking?

That suggests to me at a minimum the endpoints should not be singular since they represent collections, so at a minimum we should rename them.

The Gramps schema with the object lists sort of approximates the subcollections approach, but it is not as formalized. The keyword names do not align with the collections they represent, they all get returned in the response, and we do not include relevant hypermedia links.

The current approach certainly reduces the codebase and helps produce a quicker deliverable more familiar to Gramps developers. But longer term it might be more beneficial to step back and be more purist about it.

What are your and Nick's thoughts?

cdhorn commented 3 years ago

Then again there is something to be said for keeping it simple, and it feels simpler at the moment.

DavidMStraub commented 3 years ago

Interesting. Concerning the resource naming, I think we did a pretty good job. Concerning singular vs plural I don't have a preference, I think both is fine, as long as we're consistent. (Before your changes I had /api/people for the collection and /api/person/grampsid for the singleton, but that was a poor choice.)

Concerning HATEOAS, this is new to me; indeed I'm not sure this is actually a big benefit for us. But could you maybe give a simple example how this would look differently?

cdhorn commented 3 years ago

Take a look over https://restfulapi.net/rest-api-design-tutorial-with-example/ to get a feel for how it affects design.

I'm just trying to get my head around it, looking over that I think we would end up with something that would start to look like this if we wanted to be purists and head in that direction:

people people/{{handle}} people/{{handle}}/addresses people/{{handle}}/names people/{{handle}}/attributes people/{{handle}}/citations people/{{handle}}/participants (For event references) people/{{handle}}/families people/{{handle}}/media people/{{handle}}/notes people/{{handle}}/associations (For people references) people/{{handle}}/tags people/{{handle}}/urls

participants
participants/{{handle}} participants/{{handle}}/events participants/{{handle}}/attributes participants/{{handle}}/notes

associations
associations/{{handle}} associations/{{handle}}/people associations/{{handle}}/citations associations/{{handle}}/notes

families families/{{handle}} families/{{handle}}/attributes families/{{handle}}/members (Child and parent references) families/{{handle}}/participants
families/{{handle}}/media families/{{handle}}/notes families/{{handle}}/tags

members
members/{{handle}} members/{{handle}}/people members/{{handle}}/citations members/{{handle}}/notes

events events/{{handle}} events/{{handle}}/participants events/{{handle}}/attributes events/{{handle}}/citations events/{{handle}}/media events/{{handle}}/notes events/{{handle}}/tags

places
places/{{handle}} places/{{handle}}/names places/{{handle}}/locations places/{{handle}}/citations places/{{handle}}/media places/{{handle}}/notes places/{{handle}}/tags places/{{handle}}/urls

names names/people names/people/{{handle}} names/places names/places/{{handle}}

citations citations/{{handle}} citations/{{handle}}/source citations/{{handle}}/attributes citations/{{handle}}/media citations/{{handle}}/notes citations/{{handle}}/tags

sources sources/{{handle}} sources/{{handle}}/attributes sources/{{handle}}/media sources/{{handle}}/notes sources/{{handle}}/repositories sources/{{handle}}/tags

locations locations/places locations/places/{{handle}} locations/repositories locations/repositories/{{handle}}

repositories repositories/{{handle}} repositories/{{handle}}/notes repositories/{{handle}}/tags repositories/{{handle}}/urls

media media/{{handle}} media/{{handle}}/attributes media/{{handle}}/citations media/{{handle}}/notes media/{{handle}}/tags

notes notes/{{handle}} notes/{{handle}}/tags

tags tags/{{handle}}

I kind of get it but it seems like academic overkill. It reminds me of the whole normalize/denormalize argument with database models. So perhaps we just rename the endpoints and leave it there.

DavidMStraub commented 3 years ago

Yes, I see. I mean, in principle I think all those endpoints would make sense in addition to what we have. But only having the linked information in all those endpoints rather than in one big dict seems like a big drawback to me. I feel that the genealogical data is special because there are so many connections between objects: a person can have dozens of events, notes, tags, and all that. We wouldn't want a client to have to make dozens of HTTP requests just to display a person detail page. So I think what we (you, mostly :wink:) have done so far is actually more useful.

cdhorn commented 3 years ago

Yeah I agree, it would be crazy to have to issue 100 or more gets to render a person page.

I have not really thought at all about create/update/delete though.

Nick-Hall commented 3 years ago

so we could PUT to /api/person/ to update

Using PATCH would be an alternative for primary objects, PUT is probably sufficient for filters though.

Nick-Hall commented 3 years ago

What are your and Nick's thoughts?

Some extra endpoints may be useful, but probably only for primary objects. For example, /api/people/{handle}/events could list the events for a given person. However /api/events/?participant={handle} would provide the same functionality.