smlnj / legacy

This project is the old version of Standard ML of New Jersey that continues to support older systems (e.g., 32-bit machines).
BSD 3-Clause "New" or "Revised" License
34 stars 9 forks source link

HTML4 parser - lexer reached a stuck state #317

Open Skyb0rg007 opened 4 months ago

Skyb0rg007 commented 4 months ago

Version

110.99.5 (Latest)

Operating System

OS Version

No response

Processor

System Component

SML/NJ Library

Severity

Minor

Description

The HTML4 lexer does not specify rules for all inputs

Transcript

- HTML4Parser.fromString "<";
uncaught exception Fail [Fail: lexer reached a stuck state]
  raised at: smlnj-lib/HTML4/html4.l.sml:94.46-94.80
             ml-lpt/lib/err-handler.sml:261.63

Expected Behavior

- HTML4Parser.fromString "<";
val it = NONE : html option

Note that this is what happens when you pass an incomplete tag such as "<x"

Steps to Reproduce

See transcript

Additional Information

The issue is in html4.l. There needs to be a case that handles "<" and "</" that are not followed by an alpha character or "!--".

Email address

skyler DOT soss AT gmail.com

JohnReppy commented 2 months ago

As currently implemented, this library does not have the infrastructure to produce error messages. I have modified the lexer so that it now raises a more informative exception (this change will be included in 110.99.6).

- HTML4Parser.fromString "<";

uncaught exception Fail [Fail: Unexpected character '<']
  raised at: html4.l.sml:230.16-230.67
             ml-lpt/lib/err-handler.sml:261.63

A complete fix (i.e., properly reporting an error message and returning NONE) will require a lot more work and possibly a change to the API, so I'm leaving the bug open for now.