lambdafu / smc.mw

A MediaWiki parser for Python.
Other
13 stars 5 forks source link

How to process templates. #8

Open miki725 opened 9 years ago

miki725 commented 9 years ago

I am trying to process a wikimedia database dump and this is the only library I found which actually does what I need however I can't figure out how to process wiki templates. Any suggestions?

here is the page on the wiki I am processing from the dump: http://en.wikivoyage.org/wiki/%27s-Hertogenbosch

page in the mediawiki format: https://gist.github.com/miki725/991793c24b4fc2bf41b2

processed html: https://gist.github.com/miki725/f3df910f63525f9fc47d

As you can see a bunch of places are not processed such as sleep section - https://gist.github.com/miki725/f3df910f63525f9fc47d#file-processed-L115-L123

lambdafu commented 9 years ago

Hi, thanks for your input! I haven't looked at your example yet, but in general, the way to load templates is to inherit from mw.Preprocessor and override the get_template method. There is an example here: https://github.com/lambdafu/smc.mw/blob/master/tests/mwtests.py#L71 Is this adequate?

miki725 commented 9 years ago

what am I suppose to return there? this did not work for me:

class MWPreprocessor(Preprocessor):
    def get_template(self, namespace, pagename):
        if pagename.lower() == 'pagebanner':
            return 'hello world'
        return super(MWPreprocessor, self).get_template(namespace, pagename)
lambdafu commented 9 years ago

Mh, dunno what went wrong for you, but with 6017eb5f there is a simpler example now:

 $ echo '{{NEWS}}' | PYTHONPATH=. python smc/mw/tool.py -T . 
 <html><body><div id="toc" class="toc"><div class="toctitle"><h2>Contents</h2></div><ul>
 ...