Open IsaacHaze opened 10 years ago
There are no links found for the sample pages "Andre Agassi" or "Groen (partij)".
In [1]: from semanticizest.parse_wikidump import clean_text In [2]: clean_text("""{{ def }}abc""") Out[2]: 'abc' In [3]: clean_text("""{{ def {{123}} }}abc""") Out[3]: ' }}abc' In [4]: clean_text("""{{ def ...: ...: | asd = [[34]] ...: ...: | wqe = {{be|blaat}} ...: ...: | vrouwen = ...: ...: }} ...: ...: [[nep:perd|0px]] ...: ...: abc ...: ...: """) Out[4]: '\n'
The _UNWANTED regex needs tweaking...
Meh... stupid nested wikisyntax...
There are no links found for the sample pages "Andre Agassi" or "Groen (partij)".
The _UNWANTED regex needs tweaking...