jamietre / CsQuery

CsQuery is a complete CSS selector engine, HTML parser, and jQuery port for C# and .NET 4.
Other
1.15k stars 249 forks source link

CreateDocument throws an System.NullReferenceException when a html string is passed in #166

Open cartics opened 10 years ago

cartics commented 10 years ago

Update: added stack trace

  1. Browse to an internet web page and save the web page to the disk as "aaa.txt".
  2. Invoke CQ.CreateDocumentFromFile with the file location succeeds, where as invoking, CQ.CreateDocument with the contents of the file, a Null Reference exception is thrown.

Sample code with bug:

var htmlFileName = @"courthouse.txt";
var htmlBody = File.ReadAllText(htmlFileName);
cq = CQ.CreateDocument(htmlBody);

Code which works with filename:

var cq = CQ.CreateDocumentFromFile(@"courthouse.txt");

Stack Trace:

Unhandled Exception: System.NullReferenceException: Object reference not set to an instance of an object.
   at CsQuery.HtmlParser.ElementFactory.Parse(Stream inputStream, Encoding encoding)
   at CsQuery.HtmlParser.ElementFactory.Create(Stream html, Encoding streamEncoding, HtmlParsingMode parsingMode, HtmlParsingOptions parsingOptions, DocType docType)
   at CsQuery.CQ.CreateNew(CQ target, Stream html, Encoding encoding, HtmlParsingMode parsingMode, HtmlParsingOptions parsingOptions, DocType docType)

   at CsQuery.CQ..ctor(String html, HtmlParsingMode parsingMode, HtmlParsingOptions parsingOptions, DocType docType)
   at CsQuery.CQ.CreateDocument(String html)
jamietre commented 10 years ago

Not able to reproduce this in a simple test, e.g.

... save amazon.com homepage as "c:\amazon.html"

var test = File.ReadAllText("c:\\amazon.html");
var cq = CQ.CreateDocument(test);
var cq2 = CQ.CreateDocumentFromFile(filename);

.. works fine. This must have something specifically to do with the content, perhaps with character set conversion when saving? I would take a look at the contents of the file as saved from your browser.

cartics commented 10 years ago

I can repro this consistently with the following site: http://courthouseproperty.com/ Since I am not able to attach the text file, I can send the text file separately.