lwensauer / TheKey

Programmieraufgabe
Apache License 2.0
0 stars 0 forks source link

HTML-Text strippen und Wörter zählen #2

Closed lwensauer closed 2 years ago

lwensauer commented 2 years ago

string StripHtml(string html) { // create whitespace between html elements, so that words do not run together html = html.Replace(">","> ");

// parse html
var doc = new HtmlAgilityPack.HtmlDocument();   
doc.LoadHtml(html);

// strip html decoded text from html
string text = HttpUtility.HtmlDecode(doc.DocumentNode.InnerText);   

// replace all whitespace with a single space and remove leading and trailing whitespace
return Regex.Replace(text, @"\s+", " ").Trim();

}

...

lwensauer commented 2 years ago

https://stackoverflow.com/questions/1349023/how-can-i-strip-html-from-text-in-net