Open pgulley opened 3 weeks ago
Recent traffic related slowdowns bump this up in priority- I think the plan here should be to install a new middleware to log search requests- it should be off by default and togglable within the admin interface without restarting. Extra points if we can then watch the logs go by inside the admin console as well, but just getting the logs in the first place would be swell. I'll add this to my to-do list for now, since the need is urgent.
Just popping in to say that I've started to address this here: https://github.com/pgulley/web-search/tree/logging_middleware I'll have a PR up relatively soon I hope
Debugging, I ended up tangled up in DB migration stuff, so it will have to wait until next week!
A single purpose SQL table is fine for an experiment, but for production, a general purpose "properties" table would be better. There's one in the rss-fetcher, it's modeled on a config.ini file with sections:
https://github.com/mediacloud/rss-fetcher/blob/main/fetcher/database/models.py#L172 https://github.com/mediacloud/rss-fetcher/blob/main/fetcher/database/property.py
right now I'm just planning on writing logs as a text file- which isn't a super long term solution, but it will work now. I'm not sure what you mean by the properties table- are you talking about using a properties table as a log-store?
I'm not sure what you mean by the properties table- are you talking about using a properties table as a log-store?
instead of RequestLoggingConfig table
I think the advantage of implementing it with the singleton model is that we can hook it up to the admin console and just toggle it on and off there- I see the advantage from the perspective of minimizing migrations with a properties table- and it would be a nice gadget to have around- If there were a nice way to interact with it then I'd be down.
We want more insight into what typical searches look like. In particular I'd like more insight into date ranges- an ongoing log of queries with just the query time, the start date, and the end date would be very valuable. I can imagine it would also be useful to be able to distinguish usage at the admin vs contributor vs user level, and potentially the collections being used as well.
The ultimate output would ideally be some fun and useful histograms to gaze at thoughtfully in our downtime- which would be useful to prioritize collections in need of attention, and inform our backend indexing architecture.