Open pete-ppc opened 10 years ago
This issue is still present. Should this be fixed?
My inclination would be to fix it as the purpose of this library seems to be to replicate the functionality of jQuery and this method has a different behavior in jQuery.
Comments are still being read by Text(). Sometimes an element will contain ie if statements that will incorrectly become the read text: <!--[if gte mso 9]>
I've made two extension methods to strip comments:
public static CQ StripComments(this CQ cq)
{
if (cq == null) return cq;
foreach (var element in cq)
{
element.StripComments();
}
return cq;
}
public static IDomObject StripComments(this IDomObject node)
{
if (node == null || node.ChildNodes == null) return node;
List<IDomObject> commentNodes = new List<IDomObject>();
foreach (var childNode in node.ChildNodes)
{
if (childNode.NodeType == NodeType.COMMENT_NODE)
{
commentNodes.Add(childNode);
}
if (childNode.ChildNodes != null && childNode.ChildNodes.Count > 0)
{
childNode.StripComments();
}
}
foreach (var commentNode in commentNodes)
{
node.ChildNodes.Remove(commentNode);
}
return node;
}
CsQuery Version: 1.3.4 .Net Framework: 4.5
Test case (VB):
html
returns:text
returns:jQuery, by way of comparison, returns the text content without the comment content:
My workaround was to instantiate the
CsQuery.CQ
object using theCsQuery.HtmlParsingOptions.IgnoreComments
parsing option.Thank you for this much needed library.