rflechner / ScrapyFSharp

This is a reborn of Scrapysharp written in FSharp
http://rflechner.github.io/ScrapyFSharp/
The Unlicense
9 stars 3 forks source link

Update HtmlCssSelectors.fs #2

Closed ghost closed 8 years ago

ghost commented 8 years ago

use Descendants, which gets all the HTML descendants recursively from a HtmlDocument, as the HTML may not have a <body> and still be valid HTML

rflechner commented 8 years ago

Pull request merged. Thanks.

rflechner commented 8 years ago

I rolled back for the moment, because I found a different behavior. For example, go to https://github.com/rflechner/ScrapyFSharp/blob/master/docs/content/CssParserTutorial.fsx

and test

html.CssSelect "div.main""

"CssSelectorExtensions.Select [doc.Body()] selector" returns only one node, but "CssSelectorExtensions.Select (Seq.toList <| doc.Descendants()) selector" return 3 nodes.

Perhaps we may search if body node exists and adapt the method.

ghost commented 8 years ago

Hmm didn't of that, alternatively can just the descendant themselves and use other CssSelect on the resulting HtmlNode list