Report counts/timings? - Githubissues

philbudne commented 9 months ago

The rss-fetcher API is instrumented (returns counts and timings of API calls to grafana): https://github.com/mediacloud/rss-fetcher/blob/main/server/util.py#L67

Perhaps we should do the same for news-search-api?

Alternatively looks like sentry.io has an integration for fastapi: https://blog.sentry.io/fastapi-and-starlette-sentry-integrations-have-arrived/

pgulley commented 7 months ago

Agree that this would be a useful thing to have. Looking over @philbudne's implementation in the rss-fetcher and story-indexer, I think the strategy of wrapping the fastapi call feels sensible- but we are missing all of the deployment namespace management stuff that makes our other stats implementations so slick on grafana, which includes the CI system Phil has running on both of the other services- I guess there's a short conversation to be had about how we want to implement that. Would a single stats.mc.news-search-api namespace in statsd suffice?

philbudne commented 7 months ago

Would a single stats.mc.news-search-api namespace in statsd suffice?

With news-search-api being deployed by the story-indexer docker-compose.yml, that would mean stats from different deployment realms (development, staging and production) would all end up in the same place, so mc.REALM.news-search-api is still preferable to be able to keep them separate.

There are already two places that compose stats namespaces:

story-indexer: https://github.com/mediacloud/story-indexer/blob/main/indexer/app.py#L239

In the rss-fetcher, the composed prefix is passed in as an environment var: https://github.com/mediacloud/rss-fetcher/blob/main/dokku-scripts/config.sh#L73

Since news-search-api is being launched from the story-indexer docker-compose.yml it would be easy to pass in a STATSD_PREFIX environment variable with a composed prefix (as I did in the rss-fetcher).

In both rss-fetcher and story-indexer, I've written pretty small wrappers around a PyPI statsd package.

statsd's native format for timings is to keep aggregate stats (upper, lower, mean/stdev). But if we wanted to report via histogram/heatmap, I'd argue for a common mcstats package...

pgulley commented 2 months ago

We've decided to implement our profiling via Sentry

mediacloud / news-search-api

Report counts/timings? #25