nathell / clj-tagsoup

A HTML parser for Clojure.
Other
181 stars 22 forks source link

Incorrect nesting #17

Open neverfox opened 8 years ago

neverfox commented 8 years ago

In the wild, I have noticed that the results of parse-string don't always nest as expected. For example, if you pull in the DOM for https://www.google.com/search?q=dentist+pinellas+county+fl (I'm using clj-http to make the request), there is a div with id ires (the search results) that contains one child ol which itself contains 10 or so children with class g (each individual search result). The first of these children also contains class _Arj. However, the tagsoup result shows the ol and the _Arj g div as siblings, and the remaining g-classed divs as direct children of the body tag. I'm not sure if this is an issue with clj-tagsoup or something upstream, but I thought I'd bring it to your attention.