spencermountain / wtf_wikipedia

a pretty-committed wikipedia markup parser
https://observablehq.com/@spencermountain/wtf_wikipedia
MIT License
770 stars 129 forks source link

Add support for Norwegian redirects #570

Closed dagingaa closed 9 months ago

dagingaa commented 9 months ago

Useful for parsing the Norwegian wikipedia dumps, to avoid redirects when processing the data dump.

The XML for Norwegian looks like this:

  <page>
    <title>Det norske Arbeiderparti</title>
    <ns>0</ns>
    <id>1</id>
    <redirect title="Arbeiderpartiet" />
    <revision>
      <id>15529121</id>
      <parentid>9343288</parentid>
      <timestamp>2016-01-09T14:07:08Z</timestamp>
      <contributor>
        <username>StigBot</username>
        <id>27430</id>
      </contributor>
      <minor />
      <comment>Standardisering av omdirigeringer</comment>
      <model>wikitext</model>
      <format>text/x-wiki</format>
      <text bytes="83" xml:space="preserve">#OMDIRIGERING [[Arbeiderpartiet]]

[[Kategori:Omdirigeringer fra eldre skriveform]]</text>
      <sha1>ljql4j8oluonmqkzqq17l5d8ylf1uo0</sha1>
    </revision>
  </page>
spencermountain commented 9 months ago

🏆