topfunky / hpple

An XML/HTML parser for Objective-C, inspired by Hpricot.
http://topfunky.com
MIT License
2.77k stars 471 forks source link

Crashes when selecting certain elements on certain pages #26

Closed gdaolewe closed 11 years ago

gdaolewe commented 11 years ago

The following code crashes:

NSData* data = [NSData dataWithContentsOfURL:[NSURL URLWithString:@"http://tvtropes.org/pmwiki/pmwiki.php/Comicbook/TheAvengers"]];
    TFHpple* hpple = [TFHpple hppleWithHTMLData:data];
    NSArray* array = [hpple searchWithXPathQuery:@"//div"];

with output:

'NSInvalidArgumentException', reason: '-[__NSCFDictionary setObject:forKey:]: attempt to insert nil value (key: nodeContent)'

Using queries like @"//table", @"//td", @"//tr" also crashes, while @"//span", @"//a" do not.

I've put this in a simple, separate class to isolate the issue. The search I'm actually trying to do is @"//div[@class='indent']", which crashes on the above URL and many other TVTropes pages, but does work on for example http://tvtropes.org/pmwiki/pmwiki.php/Film/TheAvengers- but plain @"//div" doesn't work there either.

gdaolewe commented 11 years ago

Figured out that pages I was running this code on were ISO Latin-1 formatted, XPathQuery expects UTF8.