src-d / lookout-sdk

SDK for lookout analyzers
Apache License 2.0
4 stars 11 forks source link

[Python SDK] Propagate log fields for python analyzers #78

Closed se7entyse7en closed 5 years ago

se7entyse7en commented 5 years ago

This PR is part of #71. This doesn't include logger interceptors yet.

The first thing that I'd like to point out is that gRPC for python seems a big mess. Almost everything that I used was marked as having an experimental api, there's almost no documentation at all and the source code at glance doesn't seem much pythonic.

This first part solves the problem of propagating the log fields round trip from lookout. Objectives:

  1. keep a simple api by changing it as little as possible,
  2. make the python a bit more pythonic :)

How it works

This works differently from the go sdk. The first idea was to use a similar approach as for go sdk, that is to add a server interceptor that reads the metadata and put it in the context. The problem is that the server interceptor only accepts a function that must be called to continue the request and an invocation metadata variable. There's no context to which attach the log fields to be extracted from the metadata.

The context that is passed to the servicer methods (NotifyReviewEvent and NotifyPushEvent) is created internally after that server interceptor. The context is a data structure that already contains the log fields (in contrast to what happens with go). In order to have a simple api to add and read log fields, I've created a WrappedContext. This wrapped context is automatically passed to the servicer methods using the pb.wrap_context method decorator.

In order to make this thing as more transparent as possible to the user, the AnalyzerServicerMetaclass is used to automatically wrap the NotifyReviewEvent and NotifyPushEvent with the pb.wrap_context decorator. The wrapped method then calls the user-defined notify_push_event and notify_review_event that must be implemented in the analyzer.

Now the problem is that the gRPC python api doesn't enforce this context to be passed, but accepts a metadata that should be filled manually. In order to make this process as more transparent as possible to the user, I changed pb.DataStub so that it exposes a get_changes and get_files whose first argument is the WrappedContext.

With these changes, the logs on lookout look the same for language-analyzer.go and language-analyzer.py.

API changes

From the user perspective, the api doesn't change much. The user has to add the pb.wrap_context decorator to the servicer methods The user has to change the NotifyReviewEvent and NotifyPushEvent to the more pythonic snake case names notify_review_event and notify_push_event respectively, and has to change the calls to from stub.GetChanges(request) to stub.get_changes(context, request) (analogously for GetFiles).

If we want we could actually even avoid the user having to wrap the servicer methods with the decorator by changing the class pointed by pb. AnalyzerServicer and making it points to another class that has this wrapping built-in.

smacker commented 5 years ago

The first thing that I'd like to point out is that gRPC for python seems a big mess. Almost everything that I used was marked as having an experimental api, there's almost no documentation at all and the source code at glance doesn't seem much pythonic.

I feel you. It's a complete mess.

pb.wrap_context decorator to the servicer methods

maybe it should "just work" as it just works for go without any additional code/decorators required by the user. Returning decorated class imo ok.

This works differently from the go sdk.

even I quite like your solution. It looks simple enough and pythonic, I didn't get the problem with context. For example here: https://github.com/opentracing-contrib/python-grpc/blob/master/grpc_opentracing/_server.py#L146 they extend context populating it with spans created from metadata. And then user can use it as https://github.com/opentracing-contrib/python-grpc/blob/master/examples/integration/integration_server.py#L26

se7entyse7en commented 5 years ago

@smacker if you're referring to this:

Now the problem is that the gRPC python api doesn't enforce this context to be passed, but accepts metadata that should be filled manually.

what I mean is that currently when we call the GetChanges method for example, the context is not supposed to be passed along.

smacker commented 5 years ago

I'm referring to why for server you decided to go with decorator instead of interceptor?

for client yes, there is no much options except overriding it and introducing one more argument.

smacker commented 5 years ago

actually there is anoter option: inspect.stack. But... Maybe it's not the best idea.

se7entyse7en commented 5 years ago

Oh ok. By reading again what I wrote, I think that it's not that clear. Actually, there's no real need of adding a server interceptor in this case. Because even without doing anything the log fields are available in the context passed to the NotifyReviewEvent. The reason for which I choose to have a wrapper class is just to expose utilities to read and add log fields.

actually there is anoter option: inspect.stack

I think that this kind of magic should be our last resort 😂. But if your concern is about the user that has to add that decorator we can avoid that and hide the wrapping internally. That can be easily done using a metaclass.

se7entyse7en commented 5 years ago

The reason for which I choose to have a wrapper class is just to expose utilities to read and add log fields.

What I mean is that if we don't have the need of exposing utilities for adding and reading log fields, but just to propagate them, then the wrapping is actually not needed. I would just enforce passing the context and pass the metadata.

smacker commented 5 years ago

Thanks for the explanation. As I mentioned before I think the current solution is good. I just wanted to clarify details :) Let's just export decorated AnalyzerServicer so users don't need to decorate anything themselves if they stick to defaults.

se7entyse7en commented 5 years ago

@smacker I've updated the code so that the user has an even simpler api. I've also updated the description of the PR.

se7entyse7en commented 5 years ago

Sure! BTW what kind of update are you thinking of? One thing that is surely missing is to mention the create_server function. Do you want to add a section explaining this new python api? Or maybe it could be better to improve the docstrings in the example?

smacker commented 5 years ago

I'm not sure how to do it better. The problem is: before we were distributing just normal grpc generated code and helpers that had (hopefully) the same signature as generated code.

Now we changed API quite a lot:

It makes sense to create our own api on top of grpc for sure because grpc for python is a mess. But it should be reflected in the documentation somehow.

se7entyse7en commented 5 years ago

@smacker sorry for the delay, but I was working on the logger interceptors. BTW I was thinking to add well-written docstrings, so that in the future we can eventually automatically build a proper documentation using sphinx.

se7entyse7en commented 5 years ago

@smacker I've added the docstrings. I think that they're detailed enough for the user to understand how to use the api. Then we can decide if we want to make a more user-friendly doc for example by building html pages through sphinx.