keenlabs / KeenClient-Python

Official Python client for the Keen IO API. Build analytics features directly into your Python apps.
https://keen.io/docs
MIT License
133 stars 58 forks source link

Asynchronous API #9

Open daredevildave opened 10 years ago

daredevildave commented 10 years ago

Given the recent outage, using blocking calls to the Keen API from python is not an ideal solution. Even handling the exceptions means requests on our end can take minutes to complete while waiting for a timeout.

Any progress on an async API?

dkador commented 10 years ago

Agreed, using the Python SDK in a blocking fashion in your web server is a bad idea.

Do you have any preferences on async API? Threading/processing/Tornado/Twisted/gevent/etc? We're curious what our users would like to see.

daredevildave commented 10 years ago

My only preference really is that it has minimal dependencies. We don't use Tornado or Twisted, so I'd rather not bring those in as dependencies.

dkador commented 10 years ago

Fair enough. Do you have any examples of other libraries/SDKs that solve this problem in a way you like?

We've got plenty of ideas here but always happy to look to others for inspiration.

daredevildave commented 10 years ago

I'm not sure I know of any others I'm afraid.

thedrow commented 10 years ago

You can use https://docs.python.org/3/library/asyncio.html it's backported to 3.x & 2.x

leonsas commented 10 years ago

+1

Most use of keen would be in request-response cycles and blocking calls is less than ideal. I've been deferring calls to my async workers, but it is more complicated than it should be. Ideally it won't require any more dependencies.

thedrow commented 10 years ago

Since we're using requests we could just attach an async adapter to the session. Is the session exposed by the API?

thedrow commented 10 years ago

Now I see that a session is created for each request which is a bad practice: https://github.com/keenlabs/KeenClient-Python/blob/master/keen/api.py#L39

A session should be created whenever a KeenAPI object is created and it should be reused.

dkador commented 10 years ago

I haven't addressed the async issue yet. That's a tricky one since once we go async there's no way to guarantee delivery of events. But I just pushed version 0.3.2 which includes re-using the session object on an instance of KeenApi and now the session is exposed so you can attach an async adapter.

leonsas commented 10 years ago

How about making it optional to the user. Perhaps with a async kwarg to .add_event (and other blocking methods). Even something like initializing KeenClient with an optional async kwarg and using that unless specified explicitly. Of course, there is no guarantee about the delivery of events, but it's a tradeoff I'm sure many are willing to make.

e.g

  client = KeenClient(
        project_id="xxxx",
        write_key="yyyy",
        read_key="zzzz",
        async=True # Would default to False, maintaining current behavior.
    )
    # Will be async since client was initialized with async
    client.add_event("sign_ups", {"username": "lloyd"}) 

    # Override client's async initialization, make it a blocking call and so guaranteeing delivery
    client.add_event("sign_ups", {"username": "lloyd"}, async=False) 

Additionally, making the async kwarg default to False it would be backwards compatible.

dkador commented 10 years ago

Yeah, something like that could definitely work. I'd love to see a PR for this if you have time. Realistically speaking, it will take us some time to get to this.

leonsas commented 10 years ago

Hmm we'll see. I doubt I have time for this! I haven't looked at the code, but this shouldn't take a lot of effort/time.

leonsas commented 10 years ago

How about grequests, and making it an optional dependency, only to be used if async functionality is desired? It adds Gevent as a dependency, but would make it super simple to implement.

dkador commented 10 years ago

I'd consider it but likely would prefer a model that uses another process so we can keep external dependencies to a minimum. Curious if @thedrow has insight into how to easily do this with Requests.

jdunck commented 9 years ago

How about this - if KeenClient is constructed with asynHow about this - if KeenClient is constructed with async=True, a daemon thread is started, and all requests are queued to that thread, which actually makes the requests -- each .add_event would return a handle which could be .wait'ed upon, and KeenClient would grow a .drain or .await method for .join()'ing the daemon on container termination.c=True, a daemon thread is started, and all requests are queued to that thread, which actually makes the requests -- each .add_event would return a handle which could be .wait'ed upon, and KeenClient would grow a .drain or .await method for .join()'ing the daemon on container termination.

k2xl commented 8 years ago

Any progress or work arounds to get async event publishing?

dkador commented 8 years ago

Nothing yet. We're always looking for customer input - what would be your preference in terms of how this would be implemented?

k2xl commented 8 years ago

So I'm not so privy on how things are done in the backend with Keen (are events sent thru a persistent TCP or just regular HTTP?)

My recommendation would be the natural one - just spawn a thread when async is called.


def async_publish(args):
        t = threading.Thread(target=publish_event, args=args)
        t.start()

If there's a persistent TCP socket that needs to be established, then just check if it exists on the first call, and if it doesn't then create it there and keep it around in some static variable...

dkador commented 8 years ago

Thanks for the response.

Events are sent to us over regular HTTP (well, HTTPS, hopefully).

Do you care about knowing if your event was safely persisted? Is fire-and-forget good enough? If not, would you want a callback? A future-based interface?

jdunck commented 8 years ago

In my experience, a safer approach is to start a single thread which fetches from a queue and makes the reporting request. The main thread just adds events to the queue. This way you don't have a concurrency problem around shared state, and if a lot of events are generated all at once, it gets smoothed out rather than creating a ton of concurrent requests to upstream.

k2xl commented 8 years ago

^ What @jdunck said sounds better than what i was suggesting. It's usually how i would code these things in Golang. With python you can probably use a generator in the thread.

Fire and forget is good enough for me. Passing a callback could be helpful

hpgmiskin commented 6 years ago

@dkador in response to a comment from 4 years ago I think the raven client for sentry does a very good job of allowing different transport mechanisms.

I can see from the master branch that the approach for keen is to use a persistence handler though all aside from the default remain unimplimented.

What is the current status on adding asynchronous event creation to the python client?

tbarn commented 6 years ago

@hpgmiskin and all -

I just wanted to give a bit of an update with some behind the scenes stuff of the Python SDK. A few weeks ago, a Keen engineer has started to work on the SDK on a semi-regularly basis to get it to a better state. We have prioritized getting it to API parity on Access Keys and Datasets (based on customer requests), as well as any immediate deprecation fixes that needed to be put in place. Also, testing will be in a bit better place soon.

Once that is done in the near future, we will return to this and seriously consider making it the next issue we work on.

masojus commented 6 years ago

Yeah giving the option to use Gevent or greenlets or Twisted or Tornado or whatever as a pluggable mechanism would be nice.