topfunky / hpple

An XML/HTML parser for Objective-C, inspired by Hpricot.
http://topfunky.com
MIT License
2.77k stars 473 forks source link

Missing content when parsing html #10

Open lixing123 opened 13 years ago

lixing123 commented 13 years ago

hi, first thank you for hpple. It helps a lot.

Recently when I use hpple to parse a html file, the content, which should have something in it, is null. Here is part of the html file:

<td id="NET_20062" colspan="2">
    <pre style="line-height:1.3; font-size:14px; background-color: #FFFFFF">
    发信人:
        <a href="bbsqry?userid=znslm">znslm</a>
        (小白), 信区: Pictures 标 题: Re: 你十六岁喜欢的那个人怎么样了?发信站: 南京大学小百合站 (Mon Nov 7 21:49:51 2011)    小孩上小学了
        <img width="1" src="/images/blank.gif">
        <img alt="[:D]" src="/images/face/13.gif">
        <img width="1" src="/images/blank.gif">
        喜欢我的那位呢/ 我猜的 --
    <font class="c31">
    </pre>
</td>

the xpath string is "://tr/td/pre/a" the result is supposed to be "znslm", but it's null. how to fetch the string?

lixing123 commented 13 years ago

the problem is that hpple seems cannot parse "pre" tag. It missed everything associated with the tag.

samniu commented 12 years ago

fatal error: 'libxml/tree.h' file not found [2]