karlosjota / gwtwiki

Automatically exported from code.google.com/p/gwtwiki
0 stars 0 forks source link

possibility to fetch and process data in parallel #112

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
I don't know how complex this is but I'd like to add this as a 
feature-request...

Sites like this
https://en.wikipedia.org/wiki/List_of_sovereign_states
use hundrets of templates. The current work flow parses the wiki text 
sequentially and stops at each template occurrence (or transcluded page) to get 
its contents. 

If the database connection suffers of a higher latency, rendering the whole 
page could be rather slow.

It would be nice if we could fetch all "top-level" templates, i.e. those whose 
title is immediately known, in one go, then all second-level templates (the now 
newly known template titles) etc.
Once all transcluded content is known and the combined wiki text is created, 
parsing and transformation would begin.

Original issue reported on code.google.com by nico.kru...@googlemail.com on 30 Aug 2012 at 1:22

GoogleCodeExporter commented 9 years ago
FYI: the parallel data fetching then occurs in each level of templates to 
fetch, e.g. instead of calling getRawWikiContent for each article, there could 
be a method to which you supply _all_ templates to fetch and which returns once 
all of those are available

Original comment by nico.kru...@googlemail.com on 30 Aug 2012 at 1:25

GoogleCodeExporter commented 9 years ago

Original comment by axelclk@gmail.com on 2 Sep 2012 at 8:41

GoogleCodeExporter commented 9 years ago
I think we should determine all templates through the Wikipedia API.
Example:
http://en.wikipedia.org/w/api.php?action=query&prop=templates&tllimit=500&titles
=Tom_Hanks

and update all templates, which are not in the Derby database.

Quote from the Mediawiki API documentation
http://en.wikipedia.org/w/api.php

>>>>>
* prop=templates (tl) *
  Returns all templates from the given page(s)

This module requires read rights
Parameters:
  tlnamespace         - Show templates in this namespace(s) only
                        Values (separate with '|'): 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 100, 101, 108, 109
                        Maximum number of values 50 (500 for bots)
  tllimit             - How many templates to return
                        No more than 500 (5000 for bots) allowed
                        Default: 10
  tlcontinue          - When more results are available, use this to continue
  tltemplates         - Only list these templates. Useful for checking whether a certain page uses a certain template.
                        Separate values with '|'
                        Maximum number of values 50 (500 for bots)
  tldir               - The direction in which to list
                        One value: ascending, descending
                        Default: ascending
<<<<<

Original comment by axelclk@gmail.com on 2 Sep 2012 at 8:52

GoogleCodeExporter commented 9 years ago
I'd rather have an offline solution, i.e. an alternative Wiki-renderer should 
not depend on the Wikipedia API.

Original comment by nico.kru...@googlemail.com on 3 Sep 2012 at 8:08