Open wiz0u opened 5 years ago
Hey, just came here to ask this, basically..
For others on the search, there is a way to work around this, maybe not the most elegant solution, but it works and is the intended way to do this I assume:
$result
is what I get from the XPath expression used with HtmlAgilityPack.HtmlNode.SelectSingleNode
(I'm using HAP from PowerShell)
This works as expected so far:
PS D:\Test> $result.GetType()
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True False HtmlNode System.Object
Or, the full result
PS D:\Test> $result
Attributes : {type, name, value, checked}
ChildNodes : {}
Closed : True
ClosingAttributes : {}
EndNode : HtmlAgilityPack.HtmlNode
FirstChild :
HasAttributes : True
HasChildNodes : False
HasClosingAttributes : False
Id :
InnerHtml :
InnerText :
<--- Snipped the rest --->
So, I have four attributes, and running $result.Attributes
returns them correctly.
And now, if I want the value from the attribute called "value", I can do this:
$result.Attributes[2].Value
and I have the correct value.
And by the way:
PS D:\Test> $result.Attributes[2].GetType()
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True False HtmlAttribute System.Object
PS D:\Test> $result.Attributes.GetType()
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True False HtmlAttributeCollection System.Object
PS D:\Test>
So there already is HtmlAttribute
and HtmlAttributeCollection
, I think these are the right types, and therefore the classes already exist, @wiz0u ?
Not sure. But it would be nice to access an attribute value directly without resorting to such an array index access (also known as the infamous "off-by-one" error).
So there already is
HtmlAttribute
andHtmlAttributeCollection
, I think these are the right types, and therefore the classes already exist, @wiz0u ?
But HtmlAttribute
does not derive from HtmlNode
(yet?) so it can't be returned by SelectSingleNode
@wiz0u It has been a couple of years, but do you know if there is a solution to this problem?
I'm creating a generic HTML Parser, and I don't know the attributes' names during compile time. It seems that using Navigator has some pros/cons.
Also, unfortunately, XPathExpression doesn't decompose the XPath to indicate if ends in an attribute or not.
For some future readers, I was able to select an attribute value with the following approaches.
html.SelectSingleNode("//xpath/to/node").Attributes.AttributesWithName("class")
to extract the attribute class from a single node.
If you are doing multiple nodes, you can do
html.SelectNodes("//xpath/to/node").GetAttributeValue("class", "class")
This will get the value for the attribute class
I don't understand what the second argument is doing. Tbh, I could enter any value for it, like "xyz", and it still ran, as long as it wasn't null. There is no overload for a single argument, though.
As for the person who mentioned PowerShell, if you're in PowerShell you can easily select any attribute by doing $html.SelectNodes("//xpath/to/node").Attributes | Where-Object name -eq 'class' | Select-Object -ExpandProperty Value
. This selects the value for the attribute class
like the above code.
Note you can suppress the verbosity in PowerShell with aliases, i.e., you can shorten Where-Object
to where
or ?
, Select-Object
to select
, and -ExpandProperty
to -exp
). PS has tools that easily traverse any object you can import into the language. The PSParseHTML module provides the AgilityPack type for PowerShell to wield.
@blaisemGH You mean this one? https://www.powershellgallery.com/packages/PSParseHTML/
With a XPath expression ending in
/@attributeName
,System.Xml.XmlNode.SelectSingleNode
correctly returns an attribute node with Name & Value/InnerText matching the attribute.HtmlAgilityPack.HtmlNode.SelectSingleNode
returns the parent HtmlNode (with its attributes), instead of the attribute itselfThe reason is probably because there is no HtmlAttributeNode class yet. I don't know if it's for memory optimization or what, but it might be useful to have these, eventually created on-the-fly when these nodes gets selected.
(I ended up creating this class myself with an extension method SelectSingleNodeOrAttr to workaround this limitation of HtmlAgilityPack)