philss / floki

Floki is a simple HTML parser that enables search for nodes using CSS selectors.
https://hex.pm/packages/floki
MIT License
2.05k stars 155 forks source link

Error after upgrading to 0.35.0 #492

Closed aselder closed 11 months ago

aselder commented 11 months ago

Description

[error] GET /products/scalp-solutions-dry-scalp-treatment ** (MatchError) no match of right hand side value: [{"=", "="}, {"\"trade\"", "\"trade\""}] (floki 0.35.0) src/floki_mochi_html.erl:257: :floki_mochi_html.norm/2 (floki 0.35.0) src/floki_mochi_html.erl:241: :floki_mochi_html.tree/3 (floki 0.35.0) src/floki_mochi_html.erl:120: :floki_mochi_html.parse_tokens/2 (floki 0.35.0) lib/floki/html_parser/mochiweb.ex:10: Floki.HTMLParser.Mochiweb.parse_document/2 (tenant_web 0.1.0) lib/tenant_web/components/blocks/text.ex:130: TenantWeb.Components.Blocks.Text.maybe_render_with_existing_container/1 (tenant_web 0.1.0) lib/tenant_web/components/blocks/text.ex:17: TenantWeb.Components.Blocks.Text.render/1 (phoenix_live_view 0.19.5) lib/phoenix_live_view/tag_engine.ex:68: Phoenix.LiveView.TagEngine.component/3

To Reproduce

Steps to reproduce the behavior:

Expected behavior

A description of what is the expected behavior using the code.

philss commented 11 months ago

I'm into it. @aselder Is it possible to provide a snippet so I can reproduce the error?

aselder commented 11 months ago

I’m working on digging into exactly which snippet triggered the error. Hopefully I have it narrowed down by tomorrow ThanksAndrewSent from my iPhoneOn Oct 16, 2023, at 1:13 PM, Philip Sampaio @.***> wrote I'm into it. @aselder Is it possible to provide a snippet so I can reproduce the error?

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

philss commented 11 months ago

OK, thank you!

aselder commented 11 months ago

@philss

Here’s some info that I think clarifies the root cause. I just tried to parse that html that was found to be bad on the one page:

<spanclass=\"trade\">™ curl gelée<br><br><br></spanclass=\"trade\">

With Floki 0.35.0, I get the same exact error we saw on production:

iex(3)> Floki.parse_fragment(f)
** (MatchError) no match of right hand side value: [{"=", "="}, {"\"trade\"", "\"trade\""}]
    (floki 0.35.0) src/floki_mochi_html.erl:257: :floki_mochi_html.norm/2
    (floki 0.35.0) src/floki_mochi_html.erl:241: :floki_mochi_html.tree/3
    (floki 0.35.0) src/floki_mochi_html.erl:120: :floki_mochi_html.parse_tokens/2
    (floki 0.35.0) lib/floki/html_parser/mochiweb.ex:10: Floki.HTMLParser.Mochiweb.parse_document/2
    iex:3: (file)

with Floki 0.34.3, it works:

iex(3)> Floki.parse_fragment(f)
{:ok,
 [
   {"spanclass", [{"=", "="}, {"\"trade\"", "\"trade\""}],
    ["™ curl gelée", {"br", [], []}, {"br", [], []}, {"br", [], []}]}
 ]}

So it was some invalid html that was not causing problem before, but after upgrading floki, it was an error

philss commented 11 months ago

@aselder thank you so much! It was a silly mistake of mine. I'm going to release a patch version soon :)

philss commented 11 months ago

Done! Please try the version 0.35.1 :)