snoyberg / xml

Various XML utility packages for Haskell
71 stars 64 forks source link

html-conduit: capitalized `<BR>` is not auto-closed #167

Closed EarlGray closed 3 years ago

EarlGray commented 3 years ago

I am using html-conduit-1.3.2.1.

The observed behavior

I observe <br> tags being auto-closed properly, i.e. EventEndElement is issued automatically after EventBeginElement, but capital-case <BR> is not auto-closed.

A minimal reproducible example (an excerpt from https://lwn.net):

<body>
  Copyright © 2021, Eklektix, Inc.<BR>
  Linux  is a registered trademark of Linus Torvalds<br>
</body>

The produced sequence of events (nameNamespace=Nothing, namePrefix=Nothing redacted for clarity):

EventBeginElement (Name {nameLocalName = "body", ..}) []
  EventContent (ContentText "\n  Copyright \169 2021, Eklektix, Inc.")
  EventBeginElement (Name {nameLocalName = "BR", ..}) []
    EventContent (ContentText "\n  Linux  is a registered trademark of Linus Torvalds")
    EventBeginElement (Name {nameLocalName = "br", ..}) []
    EventEndElement (Name {nameLocalName = "br", ..})
    EventContent (ContentText "\n")
  EventEndElement (Name {nameLocalName = "BR", ..})
EventEndElement (Name {nameLocalName = "body", ..})

The desired behavior

EventEndElement should be issued immediately for <BR> and other upper-case auto-closing tags:

EventBeginElement (Name {nameLocalName = "BR", ..})
EventEndElement (Name {nameLocalName = "BR", ..})
snoyberg commented 3 years ago

Good catch, thanks! This should be fixed in html-conduit-1.3.2.2