jamietre / CsQuery

CsQuery is a complete CSS selector engine, HTML parser, and jQuery port for C# and .NET 4.
Other
1.16k stars 249 forks source link

Text() / .InnerText do not include text of nested elements #186

Closed fschwiet closed 9 years ago

fschwiet commented 9 years ago

When using jQuery or the DOM, .text() and .innerText include the text of inner DOM elements. In the below example, observe the contents of the inner are included:

foo = $( "<td class=\"vtable-word\">est<span class=\"conj-irregular\">á</span></td>")
[<td class=​"vtable-word">​…​</td>​]
foo[0]
<td class=​"vtable-word">​…​</td>​
foo[0].innerText
"está"
foo.text()
"está"

in CsQuery, the nested elements are not included in .Text() or .InnerText. Below is a failing test, the .Text() result equals "est", not including the s contents as I'd expect.

[Test]
public void Misc ()
{
    CsQuery.CQ document = "<td class=\"vtable-word\">est<span class=\"conj-irregular\">á</span></td>";

    var td = document ["td"].First ();

    Assert.AreEqual ("está", td.Text());
}
fschwiet commented 9 years ago

Issued was observed on version 1.3.4

fschwiet commented 9 years ago

Ok, though I was quite certain the given test was failing before (using .Text()), it is passing for me now. I have no idea why. It seems .InnerText is still behaving badly though, which is what I was using in the code I am trying to fix. With the tests below, at version 1.3.4, TextIncludesSubElements passes while test InnerTextIncludesSubElements fails. I sear TextIncludesSubElements (formerly called Misc) was failing before, but, its hard to explain why it would change and easier to conclude I made a mistake in my observations.

    [Test]
    public void TextIncludesSubElements ()
    {
        CsQuery.CQ document = "<td class=\"vtable-word\">est<span class=\"conj-irregular\">á</span></td>";

        var td = document ["td"].First ();

        //  https://github.com/jamietre/CsQuery/issues/186
        Assert.AreEqual ("está", td.Text());
    }

    [Test]
    public void InnerTextIncludesSubElements ()
    {
        CsQuery.CQ document = "<td class=\"vtable-word\">est<span class=\"conj-irregular\">á</span></td>";

        foreach (var element in document["td"]) {
            Assert.AreEqual ("está", element.InnerText);
        }
    }
fschwiet commented 9 years ago

I can use element.Cq().Text() instead of .InnerText to get the result I want now. I don't know why I saw .Text() misbehaving, but I will assume I was wrong on that and close this issue out.

(I still expected .InnerText to behave differently, but CsQuery is your design I'm not sure if that makes sense for you)