keenlabs / KeenClient-Python

Official Python client for the Keen IO API. Build analytics features directly into your Python apps.
https://keen.io/docs
MIT License
133 stars 58 forks source link

Add support for order_by and limit_by #135

Closed BlackVegetable closed 6 years ago

BlackVegetable commented 7 years ago

Added support for order_by and limit.

BlackVegetable commented 7 years ago

That took an embarrassing number of commits. Porting code between python 2 and 3 is somewhat error prone sometimes, it turns out. The python 2.6 test still fails but that is fixed in another PR.

BlackVegetable commented 7 years ago

Thanks for the feedback. I'll address this stuff tomorrow!

On Oct 5, 2017 4:44 PM, "Justin Mason" notifications@github.com wrote:

@masojus requested changes on this pull request.

Made a few comments. I'm not adamant about fixing all of them, but there are definitely a few that need addressing, so I'll be on the lookout for your responses. Thanks!

In README.rst https://github.com/keenlabs/KeenClient-Python/pull/135#discussion_r143069881 :

@@ -101,6 +101,21 @@ For more code samples, take a look at Keen's `docs <https://keen.io/docs/api/?py keen.count_unique("purchases", target_property="user.id", timeframe="this_14_days") # => 3 keen.select_unique("purchases", target_property="user.email", timeframe="this_14_days") # => ["bob@aol.com", "joe@yahoo.biz"]

  • Alpha support for ordering your results and limiting what is returned is now supported in the Python SDK.

  • Keep in mind that even if you limit your results with the 'limit' keyword, you are still querying over the

  • normal amount of data, and thus your compute costs will not change. Limit only changes what is displayed.

  • The keyword "limit" must be a positive integer. The keyword "order_by" must be a dictionary with a required

  • "property_name" specified and optionally a "direction". The "direction" may be either "DESC" (descending) or

  • "ASC" (ascending). No other keywords may be used in the "order_by" dictionary.

  • You may only use "order_by" if you supply a "group_by". You may only use "limit" if you supply an "order_by".

  • This will run a count query with results grouped by zip code.

  • It will display only the top ten zip code results based upon how many times users in those zip codes logged in.

Why is this based on "how many times users....logged in?" Does the "result" property have some special meaning or am I missing something? Should it maybe be something like ..."property_name": "user.login_attempts"... instead?

In README.rst https://github.com/keenlabs/KeenClient-Python/pull/135#discussion_r143069998 :

@@ -101,6 +101,21 @@ For more code samples, take a look at Keen's `docs <https://keen.io/docs/api/?py keen.count_unique("purchases", target_property="user.id", timeframe="this_14_days") # => 3 keen.select_unique("purchases", target_property="user.email", timeframe="this_14_days") # => ["bob@aol.com", "joe@yahoo.biz"]

  • Alpha support for ordering your results and limiting what is returned is now supported in the Python SDK.

  • Keep in mind that even if you limit your results with the 'limit' keyword, you are still querying over the

  • normal amount of data, and thus your compute costs will not change. Limit only changes what is displayed.

  • The keyword "limit" must be a positive integer. The keyword "order_by" must be a dictionary with a required

  • "property_name" specified and optionally a "direction". The "direction" may be either "DESC" (descending) or

  • "ASC" (ascending). No other keywords may be used in the "order_by" dictionary.

  • You may only use "order_by" if you supply a "group_by". You may only use "limit" if you supply an "order_by".

  • This will run a count query with results grouped by zip code.

  • It will display only the top ten zip code results based upon how many times users in those zip codes logged in.

  • keen.count("purchases", group_by="zip_code", timeframe="this_14_days", "limit"=10,

The limit keyword param name needn't be in quotes.

In keen/init.py https://github.com/keenlabs/KeenClient-Python/pull/135#discussion_r143070854 :

@@ -90,17 +91,21 @@ def count(event_collection, timeframe=None, timezone=None, interval=None, filter example: [{"property_name":"device", "operator":"eq", "property_value":"iPhone"}] :param group_by: string or array of strings, the name(s) of the properties you would like to group you results by. example: "customer.id" or ["browser","operating_system"]

  • :param order_by: dictionary object containing the property_name to order by and the
  • desired direction of sorting. Example: {"property_name":"result", "direction":"DESC"}
  • :param limit: positive integer limiting the displayed results of a query using order_by

Should this comment that's copy/pasted everywhere mention that order_by can only be there if group_by is not None and not empty? Similarly, should it indicate that limit only applies if there is an order_by param provided too? Should those three params really be an instance of some new GroupByOptions class that upon creation validates those constraints instead? You already chose to pack the direction option into the dict for order_by so at the least the limit option could be right there in that dict so it's clear that it belongs with order_by, no?

Really in general the fact that adding a single param means copy/pasting this same stuff all over place maybe indicates that this parameter list really at some point should have been moved to an Options property bag instead, maybe? That'd be a bigger, super breaking change, though.

In keen/api.py https://github.com/keenlabs/KeenClient-Python/pull/135#discussion_r143071292 :

@@ -1,3 +1,4 @@ +from future import print_function

You got rid of the print() calls at some point, I believe.

In keen/api.py https://github.com/keenlabs/KeenClient-Python/pull/135#discussion_r143072217 :

  • if "direction" in d and not (d["direction"] == "ASC" or d["direction"] == "DESC"):
  • Bad direction provided.

  • return True
  • for k in d:
  • if k != "property_name" and k != "direction":
  • Unexpected key.

  • return True
  • Everything looks good!

  • return False
  • Missing required key.

  • return True
  • order_by is converted to a list before this point if it wasn't one before.

  • order_by_list = json.loads(params["order_by"])
  • for order_by in order_by_list:

Should we explicitly make sure this list is of length 1? Or is there a legitimate reason it could have multiple entries? At this point it also shouldn't be empty, so just grabbing order_by_list[0] should be fine, right?

In keen/api.py https://github.com/keenlabs/KeenClient-Python/pull/135#discussion_r143072864 :

  • if "direction" in d and not (d["direction"] == "ASC" or d["direction"] == "DESC"):
  • Bad direction provided.

  • return True
  • for k in d:
  • if k != "property_name" and k != "direction":
  • Unexpected key.

  • return True
  • Everything looks good!

  • return False
  • Missing required key.

  • return True
  • order_by is converted to a list before this point if it wasn't one before.

  • order_by_list = json.loads(params["order_by"])
  • for order_by in order_by_list:

Oh wait, now I see that you can have multiple orderings, so really order_by can be a list or a dict, but the comments don't seem to indicate that...so I'd say we have to change the comment everywhere it's pasted so that it makes that clear :/

In keen/tests/client_tests.py https://github.com/keenlabs/KeenClient-Python/pull/135#discussion_r143073381 :

@@ -492,6 +495,25 @@ def test_interval(self, get): resp = keen.count("query test", timeframe="this_2_days", interval="daily") self.assertEqual(type(resp), list)

  • def test_order_by(self, get):

Do you plan to add more tests, like making sure it fails for invalid combinations of limit, group_by and order_by or malformed/invalid limit or order_by params?

In keen/api.py https://github.com/keenlabs/KeenClient-Python/pull/135#discussion_r143074457 :

  • return True
  • if "property_name" in d and d["property_name"]:
  • if "direction" in d and not (d["direction"] == "ASC" or d["direction"] == "DESC"):
  • Bad direction provided.

  • return True
  • for k in d:
  • if k != "property_name" and k != "direction":
  • Unexpected key.

  • return True
  • Everything looks good!

  • return False
  • Missing required key.

  • return True
  • order_by is converted to a list before this point if it wasn't one before.

  • order_by_list = json.loads(params["order_by"])

It's unfortunate that by the time we validate this, we've already serialized to JSON only to immediately deserialize back from JSON just to sanitize this input :/ It seems like a waste of cycles, though I realize to avoid that, this check would have to be higher up the call chain before get_params() is called and so would have to be called in several places instead of just in query(). Just something to think about.

In keen/client.py https://github.com/keenlabs/KeenClient-Python/pull/135#discussion_r143075271 :

@@ -229,16 +229,20 @@ def count(self, event_collection, timeframe=None, timezone=None, interval=None, example: [{"property_name":"device", "operator":"eq", "property_value":"iPhone"}] :param group_by: string or array of strings, the name(s) of the properties you would like to group your results by. example: "customer.id" or ["browser","operating_system"]

  • :param order_by: dictionary object containing the property_name to order by and the

Here and everywhere order_by is a param, I'd generally prefer to see a Direction.DESC enum constant (and Direction.ASC of course) so there's no room for typos and the code is self-documenting. Even just typing this comment I had to scroll up and check if it was ASC or some other substring of "ascending" which isn't great. I know enums don't exist in Python until 3.4, so that would imply pulling in the backport which is python package enum34 or something like that, but maybe it's worth it. Not sure if the six package does anything about automatically normalizing enums, but we already pull in six so maybe look at that? Another way to do it would be to just follow something like the typesafe enum pattern or just have string constants in a Direction class or anything else that makes this easier to use and less error-prone.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/keenlabs/KeenClient-Python/pull/135#pullrequestreview-67529100, or mute the thread https://github.com/notifications/unsubscribe-auth/ACitb6JJvfiJb38OME3BBGLdP1C1_Aimks5spVvHgaJpZM4PnhMl .

BlackVegetable commented 7 years ago

@masojus I've addressed (or at least commented on) each review message. Take a look sometime?

masojus commented 6 years ago

I added a few more comments, but this looks pretty good. Thanks @BlackVegetable!