Open paulolimac opened 3 years ago
Not at the moment.
ok then. thanks for reply :)
After looking about it a bit, Wikipedia (and any Mediawiki website, in general) has an API that can be used to retrieve images from an article (and surely other pages)
An example:
I guess I could try to implement an extractor if I someday find the time for it 0:)
I wonder if there's a public out-of-source-code info on the Mediawiki URL syntax ... I couldn't find with an extremely fast try, and don't feel like checking the source code.
At 1st I was thinking "Match until a question mark after /wiki/
" cuz I knew Mediawiki supports sub-articles which show up as /wiki/ORIGINAL_ARTICLE/SUB_ARTICLE
(repeating the /SUB_ARTICLE
part), but then I started thinking maybe matching until a question mark would exclude some articles.
Random question for @mikf (it is slightly related to this issue, but I do not see any better place to post it): is there a documentation that specifies how to write an extractor? By that, I mean how to use the Extractor
class and which methods are to be used depending on context.
I have done this in my own repository: download.py. I think you may get inspired.
Is there any way to download from Wikipedia and Wikimedia domains? Unsuccessfully, my commands: