Closed abitrolly closed 1 month ago
Our current attempted call rate for the disabled search endpoint is roughly 100rps (yellow trace). All of these are receiving either a rate limit response (brown trace) or a disabled response (red trace). This call rate has not changed since we implemented rate limiting or disabled search.
The issue isn't solely one of provisioning resources to sustain the search volume, it is that we don't have any viable mechanism to communicate with users of the very expensive XMLRPC API who abuse the endpoint. Architecturally XMLRPC being based on POST requests, combined with the high cardinality of results (search queries are arbitrary), makes caching this at the CDN edge or otherwise reducing the load imposed on our backends untenable in the long run.
Our current search is based on ElasticSearch, which I'm not familiar enough with to determine if such incremental syncs are viable.
@ewdurbin it is possible to publish stats by popularity on these 150rps without doing the actual requests? Without it we can only state that optimization in general sense is impossible.
popularity in what sense?
Structure or request, which query, how popular are such queries. Then it will be possible to determine overhead for certain query structures and set selective filters to cut expensive requests and optimizing most popular more.
How do you propose to "set selective filters to cut expensive requests" and how would that be less expensive than the current response?
Filters can be set at load balancer, at web server, at middlewire or at Django level. It might be possible to set them at SQL level is SQL can explain that the query is too expensive to be run. Whatever method is chosen, it depends on metrics. The best way is to add OpenTracing of course. Maybe the "abusive" requests are just malformed XML that make parser choke.
XLMRPC search has now been disabled for over three years and is not going to be re-instated. We have further disabled additional endpoints via the efforts of #16642. Given this, I am going to close this issue as our path forward is less to determine/mitigate specific patterns and more to establish new endpoints that are more readily cacheable and deprecate/disable remaining endpoints.
What's the problem this feature will solve?
An ongoing 2 months outage of XMLRPC search reported by https://status.python.org/incidents/grk0k7sz6zkp can be solved by optimizing or caching popular queries.
Describe the solution you'd like
I'd like to see the volume and contents of the:
Additional context
Depending on the statistics, it will be possible to provision additional index servers to offload API requests. or provide a way for organization to incrementally sync the database. Sync can be done either using global event notifications similar to Fedora Messaging System, or using standard P2P Merkle tree lookup mechanism employed by blockchains.