konrad / JATS-to-Mediawiki

A PubMed Central to MediaWiki converter
11 stars 5 forks source link

No support for ranges of footnote references #16

Closed jgmorse closed 8 years ago

jgmorse commented 11 years ago

In http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1097715/ we see the following: "In recent years, numerous heuristics to reconstruct phylogenies for large data sets have been proposed [1-11]."

This is encoded in NLM as follows: In recent years, numerous heuristics to reconstruct phylogenies for large data sets have been proposed [1-11].

This is intended to refer to the set of citations named B1 through B11, inclusive. However, this poses several problems in the WikiMedia environment:

  1. The fact that this is a range is implied by a combination of the hyphen and the textual context, but is not represented semantically by the schema. In strict schema terms, this markup contains two explicit links to two different citations; nothing in the markup indicates the text also refers to the citations that occur in between.
  2. While we can preserve the citation IDs 'B1', 'B11', etc. in the WikiMedia markup, the wiki only uses them for linking purposes, and never displays them; they are re-numbered according to the order in which they are (explicitly) linked to in the article body. What's more, in the References section, they are listed in the order in which they were linked to in the article body, regardless of what order they appear in the wikitext {{Citation}} template.
  3. As a purely cosmetic matter: WikiMedia auto-wraps reference links with square brackets. If the source NLM also uses square brackets around the element, then both sets of square brackets appear in the output. (We'll need to commit to XSLT 2.0 before we address that.)

As a result of the above, in the wiki, the above will render as: In recent years, numerous heuristics to reconstruct phylogenies for large data sets have been proposed [[1]-[2]].

So what was intended to refer the reader to a sequence of 11 references, now only refers to 2 of them. This can't be corrected algorithmically; the original markup would have to be improved.