jamietre / CsQuery

CsQuery is a complete CSS selector engine, HTML parser, and jQuery port for C# and .NET 4.
Other
1.15k stars 249 forks source link

Maintain whitespace or insert whitespace with .Text() #115

Open nokturnal opened 11 years ago

nokturnal commented 11 years ago

My original question was too vague, so let me try to refine it. I am attempting to obtain the text from a "set" of elements obtained using .NextUntil(). This text will be for a lucene index for searching. The issue I am having is that the Text() call correctly strips the html from the set but what I need is to insert some whitespace to maintain separation between words from different elements:

<p>this is a para</p><p>this is another para</p>

... becomes

this is a parathis is another para

... instead of the desired output of

this is a para this is another para

What is the best way to accomplish this?

Cheers :)

jamietre commented 11 years ago

Try the InnerText method -- ported from the nonstandard IE DOM element method, but I kept it around because it does something much like that. e.g.

CQ selection = myDom["some selector"];
string text = selection.First()[0].InnerText

Note that InnerText is a DomObject method, not a CQ method, so you need to access the element directly with the [0] in this example.

RudeySH commented 6 years ago

Is there a general way to get all the inner text from any given CQ? I'm looking for a solution that can handle stuff like a CQ with multiple root elements, for example.