Closed sjdirect closed 11 years ago
I agree - and this was supposed to have been fixed in 1.3.3 (if I remember correctly) are you using the most recent release?
Using 1.3.3.5, initially got the binary from nuget i believe
Yep - update. That was a bad decision based on an overly literal interpretation of the spec when I was addressing problems with character set handling. It was since undone. (must have been in 1.3.4)
I take that back. That god forsaken exception is still in the source... grr..
OK
nuget version 1.3.3 = build 1.3.3.5 = what you are using nuget version 1.3.4 = build 1.3.3.249
So you are one version behind - I tested the page you were failing on here, and it's working for me with 1.3.4, so I think this has already been resolved.
The exception is still in the source but I think it's an unreachable path so it's just an artifact. But I'm going to make sure it gets cleaned up before 1.3.5 anyway. It definitely shouldn't be there.
That did it thanks!
I'm crawling millions of sites and csquery is blowing up on a lot of them. I understand why based the exception message but it happens so often I was hoping that you would consider just using the last encoding instead of blowing up.
The code.....
Example Error 1... [2013-02-19 17:44:22,764] [3898] [ERROR] - Error occurred while loading CsQuery object for Url [http://1000carats.net/Changement couleur.htm] - [Abot.Poco.CrawledPage] [2013-02-19 17:44:22,795] [3898] [ERROR] - System.InvalidOperationException: The character set encoding changed twice, something seems to be wrong. at CsQuery.HtmlParser.ElementFactory.Parse(Stream html, Encoding encoding) at CsQuery.CQ..ctor(String html, HtmlParsingMode parsingMode, HtmlParsingOptions parsingOptions, DocType docType) at Abot.Poco.CrawledPage.InitializeCsQueryDocument() - [Abot.Poco.CrawledPage]
Example Error 2... 2013-02-19 17:41:28,260] [491] [ERROR] - Error occurred while loading CsQuery object for Url [http://abalancedbodymassageinc.com/] - [Abot.Poco.CrawledPage] [2013-02-19 17:41:28,260] [491] [ERROR] - System.InvalidOperationException: The character set encoding changed twice, something seems to be wrong. at CsQuery.HtmlParser.ElementFactory.Parse(Stream html, Encoding encoding) at CsQuery.CQ..ctor(String html, HtmlParsingMode parsingMode, HtmlParsingOptions parsingOptions, DocType docType)