mediacloud / web-search

Code that drives the public web-based tools for the Media Cloud Online News Archive and Directory.
https://search.mediacloud.org
Apache License 2.0
9 stars 15 forks source link

Log user level usage statistics ("Buisiness statistics") #831

Open pgulley opened 3 weeks ago

pgulley commented 3 weeks ago

We want more insight into what typical searches look like. In particular I'd like more insight into date ranges- an ongoing log of queries with just the query time, the start date, and the end date would be very valuable. I can imagine it would also be useful to be able to distinguish usage at the admin vs contributor vs user level, and potentially the collections being used as well.

The ultimate output would ideally be some fun and useful histograms to gaze at thoughtfully in our downtime- which would be useful to prioritize collections in need of attention, and inform our backend indexing architecture.

pgulley commented 1 week ago

Recent traffic related slowdowns bump this up in priority- I think the plan here should be to install a new middleware to log search requests- it should be off by default and togglable within the admin interface without restarting. Extra points if we can then watch the logs go by inside the admin console as well, but just getting the logs in the first place would be swell. I'll add this to my to-do list for now, since the need is urgent.

pgulley commented 1 week ago

Just popping in to say that I've started to address this here: https://github.com/pgulley/web-search/tree/logging_middleware I'll have a PR up relatively soon I hope

pgulley commented 1 week ago

Debugging, I ended up tangled up in DB migration stuff, so it will have to wait until next week!

philbudne commented 1 week ago

A single purpose SQL table is fine for an experiment, but for production, a general purpose "properties" table would be better. There's one in the rss-fetcher, it's modeled on a config.ini file with sections:

https://github.com/mediacloud/rss-fetcher/blob/main/fetcher/database/models.py#L172 https://github.com/mediacloud/rss-fetcher/blob/main/fetcher/database/property.py

pgulley commented 6 days ago

right now I'm just planning on writing logs as a text file- which isn't a super long term solution, but it will work now. I'm not sure what you mean by the properties table- are you talking about using a properties table as a log-store?

philbudne commented 6 days ago

I'm not sure what you mean by the properties table- are you talking about using a properties table as a log-store?

instead of RequestLoggingConfig table

pgulley commented 4 days ago

I think the advantage of implementing it with the singleton model is that we can hook it up to the admin console and just toggle it on and off there- I see the advantage from the perspective of minimizing migrations with a properties table- and it would be a nice gadget to have around- If there were a nice way to interact with it then I'd be down.