philss / floki

Floki is a simple HTML parser that enables search for nodes using CSS selectors.
https://hex.pm/packages/floki
MIT License
2.05k stars 155 forks source link

Floki.text is returning text of child nodes #180

Closed mphuie closed 6 years ago

mphuie commented 6 years ago
<h4 class="event-date">Wednesday, May. 2
<div class="time">9:00 a.m.<span class="spacer"> - </span>3:00 p.m.</div>
</h4>

When I use this Floki.find("h4.event-date") |> Floki.text, I get Wednesday, May 2 9:00 a.m. - 3:00 p.m. instead of just Wednesday, May 2.

I'm rewriting an existing python scraper which uses lxml which returns the correct result "h4[contains(@class, 'event-date')]/text()"

Unless I'm writing the selector incorrectly, there doesn't seem to be a way to get the immediate text of a element?

mphuie commented 6 years ago

Never mind me, didn't read all the documentation 🙈.

Floki.text(deep: false) works great.