philss / floki

Floki is a simple HTML parser that enables search for nodes using CSS selectors.
https://hex.pm/packages/floki
MIT License
2.05k stars 155 forks source link

Parsing links with a space HTML entity is broken #337

Closed mohammedgad closed 3 years ago

mohammedgad commented 3 years ago

Description

When I use Floki.parse with a link that has a space HTML entity the parse results in a matching error

Floki.parse("https://www.gadz.dev/?done=1&tab=1 ")
Floki.parse("https://www.gadz.dev#done1612907281342=&tab140508551023=0. ")
Floki.parse("&tab1=1 ")

Stacktrace

** (MatchError) no match of right hand side value: "&tab140508551023=0. "
    (floki 0.23.0) src/floki_mochi_html.erl:705: :floki_mochi_html.tokenize_charref_raw/3
    (floki 0.23.0) src/floki_mochi_html.erl:651: :floki_mochi_html.tokenize_charref/2
    (floki 0.23.0) src/floki_mochi_html.erl:306: :floki_mochi_html.tokens/3
    (floki 0.23.0) src/floki_mochi_html.erl:83: :floki_mochi_html.parse/1
    (floki 0.23.0) lib/floki/html_parser/mochiweb.ex:7: Floki.HTMLParser.Mochiweb.parse/1

To Reproduce

Steps to reproduce the behavior:

Expected behavior

The parsed output should be the link as text

mohammedgad commented 3 years ago

I upgraded to Floki v0.23.1 that solved the issue