[Site Support Request] Wikipedia and Wikimedia

mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites

GNU General Public License v2.0

11.96k stars 976 forks source link

[Site Support Request] Wikipedia and Wikimedia #1443

Open paulolimac opened 3 years ago

paulolimac commented 3 years ago

Is there any way to download from Wikipedia and Wikimedia domains? Unsuccessfully, my commands:

$ gallery-dl https://commons.wikimedia.org/wiki/Category:1st_Horseman_of_the_Apocalypse
[gallery-dl][error] No suitable extractor found for 'https://commons.wikimedia.org/wiki/Category:1st_Horseman_of_the_Apocalypse'

$ gallery-dl https://en.wikipedia.org/wiki/Gustave_Dor%C3%A9
[gallery-dl][error] No suitable extractor found for 'https://en.wikipedia.org/wiki/Gustave_Dor%C3%A9'

mikf commented 3 years ago

Not at the moment.

paulolimac commented 3 years ago

ok then. thanks for reply :)

Ailothaen commented 3 years ago

After looking about it a bit, Wikipedia (and any Mediawiki website, in general) has an API that can be used to retrieve images from an article (and surely other pages)

An example:

https://en.wikipedia.org/w/api.php?action=parse&page=Pet_door&prop=images&format=json to retrieve all image names from an article
https://en.wikipedia.org/w/api.php?action=query&titles=File:Gatera_de_ademuz.jpg&prop=imageinfo&iiprop=url to retrieve the full URL for an image name (since the exact path can change depending on the language version)

I guess I could try to implement an extractor if I someday find the time for it 0:)

rautamiekka commented 3 years ago

I wonder if there's a public out-of-source-code info on the Mediawiki URL syntax ... I couldn't find with an extremely fast try, and don't feel like checking the source code.

At 1st I was thinking "Match until a question mark after /wiki/" cuz I knew Mediawiki supports sub-articles which show up as /wiki/ORIGINAL_ARTICLE/SUB_ARTICLE (repeating the /SUB_ARTICLE part), but then I started thinking maybe matching until a question mark would exclude some articles.

Ailothaen commented 3 years ago

Random question for @mikf (it is slightly related to this issue, but I do not see any better place to post it): is there a documentation that specifies how to write an extractor? By that, I mean how to use the Extractor class and which methods are to be used depending on context.

GrimPixel commented 9 months ago

I have done this in my own repository: download.py. I think you may get inspired.