Closed branliu0 closed 10 years ago
Actually, that functionality is available in the API, through the rvsection parameter of the Query - Revisions API call:
http://www.mediawiki.org/wiki/API:Query_-_Properties#revisions_.2F_rv
So you're more than welcome to extend the get method or write a new get_section method to handle this.
Hmm, thanks for the response! I shifted to working on a different part of my project, so I'll come back to this when I need to do some Wikipedia scraping again. I didn't get a chance to look into that API call in depth.
This should be possible now by specifying the rvsection
to retrieve (see #61):
MediaWiki::Gateway.new('https://en.wikipedia.org/w/api.php').get('Banana', 'rvsection' => 0)
Closing for now, although pull requests to package this up more cleanly are welcome.
Hi,
For my own project, I'm currently writing a ruby script built on top of this gem and Nokogiri that can easily extract content from just a section of a Wikipedia article. For example, for the article on bananas (http://en.wikipedia.org/wiki/Banana), I might only want to grab the section on Taxonomy and nothing else. My script would make that really easy by specifying the page title and the section number.
I'm interested in contributing this feature to this project, but I'm wondering whether it's appropriate. The functionality isn't supported by the API, and I'm getting it to work by parsing through the HTML, so Wikimedia provides no guarantees that this will always work. This feature also wouldn't work on all Wikimedia projects, since not all of them have a Table of Contents and are broken down into sections. For example, this works on Wikipedia and Wiktionary, but would not work for Wikisource.
What do you think?
Best, Brandon