pydantic / logfire

Uncomplicated Observability for Python and beyond! 🪵🔥
https://docs.pydantic.dev/logfire/
MIT License
1.62k stars 46 forks source link

Bug fix: Logging arguments of a request to a FastAPI sub app #259

Closed sneakyPad closed 6 days ago

sneakyPad commented 2 weeks ago

This PR fixes #188 which described the issue that arguments of a request to a sub app of a FastAPI app are not recorded. In this case only the attributes were shown, but no arguments. A sub application is created like this:

subapp = FastAPI()
app.mount("/api", subapp)

The issue is that a mounted sub app is part of the routes of the original FastAPI app. Internally, logfire stores the references of the apps in the registry. When a request is made to the sub app, the reference is tried to find by registry.get(request.app). However, the sub app(s) reference(s) are not added to that registry.

What this PR does is to iterate through the routes of the original FastAPI app that is added by logfire.instrument_fastapi(app). If a route is a mounted app, it is added to mounted_apps. Each mounted app (along with the original app) is afterwards registered by creating the FastAPIInstrumentation object.

I'm aware that loops are never ideal. Happy to learn and discuss if there is a better way.

codecov[bot] commented 2 weeks ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

:loudspeaker: Thoughts on this report? Let us know!

alexmojaki commented 2 weeks ago

A loop is fine. The main things that concern me are:

  1. This needs tests, both in general and because I want to see what effect this has on the produced spans. It looks like it might create the OTEL request span twice.
  2. Ideally it would still work correctly if instrument_fastapi was called before app.mount. One option is to patch app.mount. @Kludex do you have any better ideas? I don't immediately see a way to get the 'parent' app from a subapp.
sneakyPad commented 1 week ago

Thanks! Regarding (2): Yes, I hadn't thought of that, but it would make sense. Right now it feels a little too deep for the first PR, or I might need some explanation on that. (1): I would appreciate it if you could give me some help with the testing approach. I think I have most of it figured out, but before I make any changes, I want to quickly explain what I intend to do.

A large percentage of the snapshot for test_subapp_fastapi_arguments overlaps with test_fastapi_arguments, but I think it is better to leave the existing one untouched and create one specifically for the subapp.

Does it make sense? Thanks!

test_fastapi.py

@pytest.fixture()
def app():
    # Don't define the endpoint functions in this fixture to prevent a qualname with <locals> in it
    # which won't be stripped out of the logfire msg, complicating things in different python versions.
    app = FastAPI()
    sub_app = FastAPI()
    app.mount('/api', sub_app)
    sub_app.get('/other', name='other_route_name', operation_id='other_route_operation_id')(other_route)

  ...

def test_subapp_fastapi_arguments(client: TestClient, exporter: TestExporter) -> None:
    response = client.get('/api/other?foo=foo_val&bar=bar_val')
    assert response.status_code == 422
    assert exporter.exported_spans_as_dict() == snapshot(
        [
            {
                'name': 'FastAPI arguments',
                'context': {'trace_id': 1, 'span_id': 3, 'is_remote': False},
                'parent': {'trace_id': 1, 'span_id': 1, 'is_remote': False},
                'start_time': 2000000000,
                'end_time': 3000000000,
                'attributes': {
                    'custom_attr': 'custom_value',
                    'logfire.span_type': 'span',
                    'logfire.level_num': 17,
                    'logfire.msg_template': 'FastAPI arguments',
                    'logfire.msg': 'FastAPI arguments',
                    'values': '{"foo":"foo_val"}',
                    'errors': '[{"type":"int_parsing","loc":["query","bar"],"msg":"Input should be a valid integer, unable to parse string as an integer","input":"bar_val"}]',
                    'http.method': 'GET',
                    'http.route': '/other',
                    'fastapi.route.name': 'other_route_name',
                    'fastapi.route.operation_id': 'other_route_operation_id',
                    'logfire.json_schema': IsJson(
                        {
                            'type': 'object',
                            'properties': {
                                'values': {'type': 'object'},
                                'errors': {
                                    'type': 'array',
                                    'items': {
                                        'type': 'object',
                                        'properties': {'loc': {'type': 'array', 'x-python-datatype': 'tuple'}},
                                    },
                                },
                                'http.method': {},
                                'http.route': {},
                                'fastapi.route.name': {},
                                'fastapi.route.operation_id': {},
                                'custom_attr': {},
                            },
                        }
                    ),
                    'logfire.tags': ('fastapi',),
                },
            },
            {
                'name': 'GET /api http send response.start',
                'context': {'trace_id': 1, 'span_id': 5, 'is_remote': False},
                'parent': {'trace_id': 1, 'span_id': 1, 'is_remote': False},
                'start_time': 4000000000,
                'end_time': 5000000000,
                'attributes': {
                    'logfire.span_type': 'span',
                    'logfire.msg': 'GET /api http send response.start',
                    'http.status_code': 422,
                    'asgi.event.type': 'http.response.start',
                    'logfire.level_num': 5,
                },
            },
            {
                'name': 'GET /api http send response.body',
                'context': {'trace_id': 1, 'span_id': 7, 'is_remote': False},
                'parent': {'trace_id': 1, 'span_id': 1, 'is_remote': False},
                'start_time': 6000000000,
                'end_time': 7000000000,
                'attributes': {
                    'logfire.span_type': 'span',
                    'logfire.msg': 'GET /api http send response.body',
                    'asgi.event.type': 'http.response.body',
                    'logfire.level_num': 5,
                },
            },
            {
                'name': 'GET /api',
                'context': {'trace_id': 1, 'span_id': 1, 'is_remote': False},
                'parent': None,
                'start_time': 1000000000,
                'end_time': 8000000000,
                'attributes': {
                    'logfire.span_type': 'span',
                    'logfire.msg': "GET /api/other ? bar='bar_val' & foo='foo_val'",
                    'http.scheme': 'http',
                    'http.host': 'testserver',
                    'net.host.port': 80,
                    'http.flavor': '1.1',
                    'http.target': '/api/other',
                    'http.url': 'http://testserver/api/other?foo=foo_val&bar=bar_val',
                    'http.method': 'GET',
                    'http.server_name': 'testserver',
                    'http.user_agent': 'testclient',
                    'net.peer.ip': 'testclient',
                    'net.peer.port': 50000,
                    'http.route': '/api',
                    'http.status_code': 422,
                },
            },
        ]
    )
alexmojaki commented 1 week ago

That sounds about right. There's no need to hesitate to commit and push something, it won't affect anything outside this PR and it can be reverted within the PR. Seeing the diff in GitHub will make it easier to understand.

Kludex commented 1 week ago

I'm unsure if we should handle the logic ourselves. This logic should either be in the OTel package, or they should recommend to instrument each mounted app. 🤔

sneakyPad commented 1 week ago

I just added the test I was talking about to sort of finish this PR, even though it doesn't seem clear how to proceed.

@Kludex I'm not familiar with the OTel package itself, so it's difficult for me to see the pros and cons. However, if you currently instrument the main app and the sub app, the OTEL request span is created twice for the sub app routes, which is reflected in the UI.

alexmojaki commented 1 week ago

I'm unsure if we should handle the logic ourselves. This logic should either be in the OTel package, or they should recommend to instrument each mounted app. 🤔

Logging arguments is in logfire, not OTel. The OTel part handles subapps fine because it uses middleware.

alexmojaki commented 1 week ago

@sneakyPad thanks for the test, it looks good. Please expand it so that the subapp has another subapp, I don't think that's being handled.

Also please set up pre-commit so that the lint job passes. make install should have ensured that, did you do something different?

sneakyPad commented 1 week ago

@alexmojaki great catch, thanks! Sub apps of sub apps weren't handled properly. They should now, and I added another test for that. I did not change anything for make install, but for the tests toml was missing. I've seen that the lint step fails if format hasn't been run before when I run it with make. For the pre-commit, the order is the same, format->lint. So that seemed like the issue for me.

alexmojaki commented 1 week ago

The tests aren't showing the spans produced by FastAPIInstrumentation.run_endpoint_function, please investigate that.

sneakyPad commented 1 week ago

I don't quite understand. Would you mind elaborating a bit or pointing me to some specific documentation where I can look it up?

alexmojaki commented 1 week ago

Search for {method} {http.route} ({code.function}) in both fastapi.py and test_fastapi.py.

sneakyPad commented 1 week ago

Thanks, I think I'm getting there. I assume that the reason why there are no spans produced (by FastAPIInstrumentation.run_endpoint_function) is that the tests have the same intention as test_fastapi_arguments, which doesn't make a valid request due to bar_val being a str instead of an int, resulting in 422 instead of 200. I assume this is intentional? When passing the right value I can see the spans produced by the run_endpoint_function. How did you see the spans so fast? Is there any additional setup I can take for debugging?

alexmojaki commented 1 week ago

Thanks, I think I'm getting there. I assume that the reason why there are no spans produced (by FastAPIInstrumentation.run_endpoint_function) is that the tests have the same intention as test_fastapi_arguments, which doesn't make a valid request due to bar_val being a str instead of an int, resulting in 422 instead of 200. I assume this is intentional? When passing the right value I can see the spans produced by the run_endpoint_function.

Right, that makes sense. Please change the new tests to make a valid request.

How did you see the spans so fast? Is there any additional setup I can take for debugging?

I'm just looking at what's in the snapshot.

sneakyPad commented 6 days ago

Done. While I was doing this I noticed that the logfire.msg field looks different than a normal GET request with arguments. For a sub app the query params are encoded in the logfire.msg field. For example: GET /first_lvl/other ? _id='55' And for the main app it looks like this:
 GET /other

-> That’s how it is displayed in the UI as well

I also added another test called test_get_fastapi_arguments for a successful GET on /other to verify this.

I was wondering why this happens and if it is intentional, so I found that in processor_wrapper, _tweak_http_spans adds the query params for sub apps. This functionality is triggered by this condition: if messages and (message := messages[-1]) != name: .

For example, messages[-1]) != name looks like this for a sub app: GET /first_lvl/other == GET /first_lvl .

So name is not the entire target which it is though for an endpoint at root level (GET /other) of the main fastAPI app.

alexmojaki commented 5 days ago

Thanks, this looks great!

  1. Ideally it would still work correctly if instrument_fastapi was called before app.mount. One option is to patch app.mount. @Kludex do you have any better ideas? I don't immediately see a way to get the 'parent' app from a subapp.

Yes, I hadn't thought of that, but it would make sense. Right now it feels a little too deep for the first PR, or I might need some explanation on that.

Do you want to give this a go in another PR?

As for your last comment: you've highlighted a bug that I hadn't noticed and that I'll fix now, because the query params should be in the message in test_get_fastapi_arguments.

That bug happens to be canceled out by another problem with subapps, which is that http.route is just /first_lvl in the outermost span instead of /first_lvl/other, and that is used in the span name. This probably needs to be fixed within OTEL.

Interestingly http.route has the opposite problem in the arguments span: it's just /other.

If you want to try fixing either of these problems that would also be appreciated.

sneakyPad commented 23 hours ago

Do you want to give this a go in another PR?

Yes, I'd like to try and see how far I can get. Thanks for the explanations! I think patching app.mount is closer to what I've already done, which makes it a bit easier for me.