webplatform / mediawiki-conversion

Convert MediaWiki XML backup into structured raw text file tree
https://github.com/webplatform/docs
15 stars 4 forks source link

Reformat in a more reusable way MediaWiki generated HTML #20

Closed renoirb closed 8 years ago

renoirb commented 9 years ago

Documentation pages has blocks which creates inconsistent HTML blocks. Objective is to grab the content and make the data as metadata

Candidate blocks

Overview table

Take a table in this format, and simplify;

<table class="wikitable overview_table">
<tr>
<th> <a href="/wiki/css/concepts/initial_value" title="css/concepts/initial value"> Initial value</a>
</th>
<td> <code>normal</code>
</td></tr>
<tr>
<th> Applies to
</th>
<td> All elements
</td></tr>
</table>

Into...

overview_table:
  - name: Initial value
    value: normal
    link: /css/concepts/initial_value
  - name: Applies to
    value: All elements

CSS Property

Take any css property and try to make a dataset that makes sense.

<div class="css-property">
<dl><dt><b>transition-property</b></dt>
<dd><i>Value of the <a href="/wiki/css/properties/transition-property" title="css/properties/transition-property"><b>transition-property</b></a> property.</i></dd></dl>
</div>
<div class="css-property">
<dl><dt><b>transition-duration</b></dt>
<dd><i>Value of the <a href="/wiki/css/properties/animation-duration" title="css/properties/animation-duration"><b>transition-duration</b></a> property.</i></dd></dl>
</div>
<div class="css-property">
<dl><dt><b>transition-timing-function</b></dt>
<dd><i>Value of the <a href="/wiki/css/properties/animation-timing-function" title="css/properties/animation-timing-function"><b>transition-timing-function</b></a> property.</i></dd></dl>
</div>
<div class="css-property">
<dl><dt><b>transition-delay</b></dt>
<dd><i>Value of the <a href="/wiki/css/properties/animation-delay" title="css/properties/animation-delay"><b>transition-delay</b></a> property.</i></dd></dl>
</div>
<p><br />
</p>

into:

css_properties:
  - name: transition-property
    comment: 'Value of the [transition-property](/css/properties/transition-property) property'
  - name: transition-duration
    comment: 'Value of the [transition-duration](/css/properties/animation-duration) property.'
  - name: transition-timing-function
    comment: 'Value of the [transition-timing-function](/css/properties/animation-timing-function) property.'
  - name: transition-delay
    comment: 'Value of the [transition-delay](/css/properties/animation-delay) property.'
renoirb commented 8 years ago

Done.