newfs / gobotany-app

Deployable code for the Go Botany application
9 stars 8 forks source link

Dkey info on species page, variety stanzas displayed as hybrids #382

Closed sidkoul closed 11 years ago

sidkoul commented 11 years ago

The species page displays the relevant bits from Arthur's flora. However, it's incorrectly confusing variety and hybrid information. Compare the following screen shots.

Allium tricoccum info from the Flora (page 92)

Allium-tricoccum-book

Allium tricoccum species page

Allium-tricoccum-web

1a and 1b are headings to a stanza that leads to a variety and not a hybrid. This is correctly displayed in the book, and incorrectly displayed on the species page. There should be no hybridization symbols (i.e. x) displayed.

Looking at the generated html, it looks like the dkey parser might have gotten confused when extracting the information from the xml file:

<p>
    <b>1a. </b>
    "×1–1.5 cm; leaves not or scarcely "
    <span class="gloss" id="gloss79">petiolate</span>
    ", white at the base, with blades (1.5–) 2–4 (–4.5)&nbsp;cm wide..."
    ...
</p>

Now that the dkey is live, it's important this get fixed quickly.

brandon-rhodes commented 11 years ago

I will go take a look at this right now!

brandon-rhodes commented 11 years ago

The “×” is appearing because it occurs in the book's original text, not because the import logic thinks that this is a hybrid. What appears to be happening is that the text “Bulbs 2–4 (–5)” is being omitted, leaving the “×” that means “by” (as in “two-by-four” = “2×4”) as the first character of the book's text that actually survives our extraction process.

Having ruled out a hybridization-logic problem, I am now looking into why the text is being omitted instead of being included during the import.

brandon-rhodes commented 11 years ago

I might have caused all sorts of problems in other species pages, of course, because any tweak to the importer can affect any page whatsoever. But with each of these commits I ran a "diff" against the whole database, and I think that in general these are improvements. And, they appear to have solved the particular problem that inspired this ticket.