Closed tstibbs closed 1 year ago
hey - this is good!
does the recursive one always go ... until the end? That's awesome. Do you wanna support a max depth parameter?
I'm thinking abt doing Category:Person
or something. how far do you think it goes?
Maybe it would be fun to build a variant that returns the nested json.
Lemme know when I can merge and release this thing. Thank you!
does the recursive one always go ... until the end? That's awesome. Do you wanna support a max depth parameter?
Yeah, it'll go right to the bottom with two exceptions:
I hadn't added a depth param because the depth of the recursion doesn't really mean anything semantically in terms of what you're asking it to bring back. I guess it would be useful as a safety net for any really big categories, though I suspect you wouldn't think to set the depth until one day when one of your category requests exploded - at which point you'd probably change your request to ask for more specific categories anyway. So I'm slightly in two minds about how useful such a param would be, but it'll be pretty simple to add while I'm in the code already so I'm happy to do that.
Question: error when depth exceeded, or just silently stop the recursion when you hit the max depth? I can see dis/advantages of both.
Maybe it would be fun to build a variant that returns the nested json.
You mean it returns all the sub-categories and their members in something like a tree structure? That would be pretty simple if wikipedia was a tree structure - but it isn't, it's a graph, and not even an acyclic graph. So an article could appear in multiple places in your tree of categories. Equally, where there is a loop in a category hierarchy, not sure how you'd express that? I guess you could return a graph of all the categories and their members (somehow, the data structure would take some thought) but might be difficult to process in to something useful once you've got it.
Regarding merging, I think the current changes are backwards-compatible so mergeable now anyway if you want, but I can probably add the max depth thing fairly quickly so up to you if you want to have a second PR for that or include it all here.
good point!
I'm happy to add the max depth thing if you want, just let me know. If not, then I think I'm done with my changes for now.
yeah, okay let's do it. just a safety brake that you can put on, but off by default? no error. like - 'grab this list of astronauts, but don't accidentally ddos for 18 hours'. I'll can hit publish this afternoon. Big relief to have this done.
Added the depth thing in #525
Putting this up as in-progress work for review.
I previously had some code which wrapped the
getCategory
function and recursively went down through all categories and sub-categories and returned all articles that were under any of those categories or sub-categories. I don't see anything in the wikimedia api that would help you do that server-side, hence doing it in code. I can't be the only person who wants to do this, so seems sensible to maintain my code as part of wtf_wikipedia rather than just in my one project.Outstanding work:
Guard against loops (e.g. a category contained within one of its sub-categories would currently cause us to loop indefinitely)Throttle requests and/or batch up requests more cleverlyfetchList
, all that actually does it batch up the requests into batches of small numbers. It doesn't (at least afaics) apply any waits or backoff at all. Given that categorymembers only supports a single category per request, there isn't any batching we can do, and we're already making the requests serially (i.e. none should be running at the same time) so I don't think there's anything we need to do here.DocNotes about the effectiveness of this code:
categorymembers
call, thus would requiregetCategories
to fetch all the pages as well, which it doesn't currently do. I'm not planning to change that just now, though perhaps a candidate for a future major version release (as returning full pages instead of just titles would be a significant change to the api)Closes #521.