philss / floki

Floki is a simple HTML parser that enables search for nodes using CSS selectors.
https://hex.pm/packages/floki
MIT License
2.05k stars 155 forks source link

(FunctionClauseError) no function clause matching in Floki.Finder.get_matches/3 #179

Closed jasonl99 closed 6 years ago

jasonl99 commented 6 years ago

I am getting this error when trying to find anything in a particular document. I've worked with elixir in the past a little, but it's quite new to me so I wouldn't be the least bit suprised if this is user error :nerd_face:.

Here's the minimum code to recreate the problem:

url = "http://weeklyad.publix.com/Publix/Entry/LandingContent?storeid=2699941&sneakpeek=N&listingid=0"
{:ok, response} = HTTPoison.get(url)
html = response.body
Floki.find(html, 'a')

The error output lists the arguments passed in, but the first one is huge - the html is about 250k, so I've truncated it to the first few items

    The following arguments were given to Floki.Finder.get_matches/3:

        # 1
        %Floki.HTMLTree{
          node_ids: [2332, 2331, 2330, 2329, 2328, 2327, 2326, 2325, 2324, 2323, 2322,
           2321, 2320, 2319, 2318, 2317, 2316, 2315, 2314, 2313, 2312, 2311, 2310, 2309,
           2308, 2307, 2306, 2305, 2304, 2303, 2302, 2301, 2300, 2299, 2298, 2297, 2296,
           2295, 2294, 2293, 2292, 2291, 2290, 2289, 2288, 2287, 2286, 2285, 2284, ...],
          nodes: %{
            462 => %Floki.HTMLTree.Comment{
              content: " Savings: Column two ",
              node_id: 462,
              parent_node_id: 397
            }, ... 

        # 2
        %Floki.HTMLTree.HTMLNode{
          attributes: [{"xmlns", "http://www.w3.org/1999/xhtml"}, {"style", ""}],
          children_nodes_ids: [41, 2],
          node_id: 1,
          parent_node_id: nil,
          type: "html"
        }

        # 3
        97

I am suspicious of the third parameter - because both functions look like the third parameter needs to be a %Floki.Selector. Some of the pertinent error output:

    Attempted function clauses (showing 2 out of 2):

        defp get_matches(tree, html_node, selector = %Floki.Selector{combinator: nil})
        defp get_matches(tree, html_node, selector = %Floki.Selector{combinator: combinator})

    (floki) lib/floki/finder.ex:69: Floki.Finder.get_matches/3
    (elixir) lib/enum.ex:2924: Enum.flat_map_list/2
    (elixir) lib/enum.ex:2924: Enum.flat_map_list/2
    (floki) lib/floki/finder.ex:53: Floki.Finder.find_selectors/2
    (floki) lib/floki.ex:127: Floki.find/2

and last but not least if it's helpful:

iex(1)> i html
Data type
  BitString
Byte size
  228812
Description
  This is a string: a UTF-8 encoded binary. It's printed with the `<<>>`
  syntax (as opposed to double quotes) because it contains non-printable
  UTF-8 encoded codepoints (the first non-printable codepoint being
  `<<3>>`).
Reference modules
  String, :binary
Implemented protocols
  List.Chars, String.Chars, Poison.Encoder, IEx.Info, Inspect, Poison.Decoder, Collectable
pdgonzalez872 commented 6 years ago

Could you try with “” instead of ‘’ ?

On Apr 28, 2018, at 09:08, jasonl99 notifications@github.com wrote:

I am getting this error when trying to find anything in a particular document. I've worked with elixir in the past a little, but it's quite new to me so I wouldn't be the least bit suprised if this is user error 🤓.

Here's the minimum code to recreate the problem:

url = "http://weeklyad.publix.com/Publix/Entry/LandingContent?storeid=2699941&sneakpeek=N&listingid=0" {:ok, response} = HTTPoison.get(url) html = response.body Floki.find(html, 'a') The error output lists the arguments passed in, but the first one is huge - the html is about 250k, so I've truncated it to the first few items

The following arguments were given to Floki.Finder.get_matches/3:

    # 1
    %Floki.HTMLTree{
      node_ids: [2332, 2331, 2330, 2329, 2328, 2327, 2326, 2325, 2324, 2323, 2322,
       2321, 2320, 2319, 2318, 2317, 2316, 2315, 2314, 2313, 2312, 2311, 2310, 2309,
       2308, 2307, 2306, 2305, 2304, 2303, 2302, 2301, 2300, 2299, 2298, 2297, 2296,
       2295, 2294, 2293, 2292, 2291, 2290, 2289, 2288, 2287, 2286, 2285, 2284, ...],
      nodes: %{
        462 => %Floki.HTMLTree.Comment{
          content: " Savings: Column two ",
          node_id: 462,
          parent_node_id: 397
        }, ... 

    # 2
    %Floki.HTMLTree.HTMLNode{
      attributes: [{"xmlns", "http://www.w3.org/1999/xhtml"}, {"style", ""}],
      children_nodes_ids: [41, 2],
      node_id: 1,
      parent_node_id: nil,
      type: "html"
    }

    # 3
    97

I am suspicious of the third parameter - because both functions look like the third parameter needs to be a %Floki.Selector. Some of the pertinent error output:

Attempted function clauses (showing 2 out of 2):

    defp get_matches(tree, html_node, selector = %Floki.Selector{combinator: nil})
    defp get_matches(tree, html_node, selector = %Floki.Selector{combinator: combinator})

(floki) lib/floki/finder.ex:69: Floki.Finder.get_matches/3
(elixir) lib/enum.ex:2924: Enum.flat_map_list/2
(elixir) lib/enum.ex:2924: Enum.flat_map_list/2
(floki) lib/floki/finder.ex:53: Floki.Finder.find_selectors/2
(floki) lib/floki.ex:127: Floki.find/2

and last but not leas if it's helpful:

iex(1)> i html Data type BitString Byte size 228812 Description This is a string: a UTF-8 encoded binary. It's printed with the <<>> syntax (as opposed to double quotes) because it contains non-printable UTF-8 encoded codepoints (the first non-printable codepoint being <<3>>). Reference modules String, :binary Implemented protocols List.Chars, String.Chars, Poison.Encoder, IEx.Info, Inspect, Poison.Decoder, Collectable — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

jasonl99 commented 6 years ago

@pdgonzalez872 LOL That was a quick fix! It worked! I'm too used to ruby using both types of quotes. Thank you!

pdgonzalez872 commented 6 years ago

Great! Keep at it! :)

On Apr 28, 2018, at 09:13, jasonl99 notifications@github.com wrote:

@pdgonzalez872 LOL That was a quick fix! It worked! I'm too used to ruby using both types of quotes. Thank you!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

philss commented 6 years ago

Thank you for the report, @jasonl99 ! And thank you for solving this, @pdgonzalez872 ! ❤️