Open graus opened 10 years ago
Did you try line_profiler
on the function? That should show the problem, if it is CPU-bound.
I haven't yet, will take a look, thanks! I did experiment with setting a hard limit on the number of links to retrieve, and that does make a difference (i.e., with a limit at 500 items, I get a speed increase of around a factor two).
(which leads me to believe redis doesn't handle the large values well)
@bartsidee got some nice speed improvement with Redis pipelines:
By default zijn de pipelines in redis python atomic, oftewel een blokkerend request. Ik heb de pipelines non-atomic mode gezet pipeline(transaction=False) met dit hielp de performance op redis (waarbij meerdere request worden afgevuurd) een heel stuk te verbeteren.
Retrieving inlinks & outlinks of Wikipedia pages is very slow, for a single query with a couple of hundred ids it can easily exceed 20 seconds (I'm timing everything inside get_articles(self, *pids)).
I don't understand why, everything is in redis, requests seem to be quick from redis-client, and the requests are similar to fetching the ID's labels (and the labels requests go fast, < 0.3 second for the same number of ids). How could this be?