zzzprojects / html-agility-pack

Html Agility Pack (HAP) is a free and open-source HTML parser written in C# to read/write DOM and supports plain XPATH or XSLT. It is a .NET code library that allows you to parse "out of the web" HTML files.
https://html-agility-pack.net
MIT License
2.65k stars 375 forks source link

">" inside "a" element "href" attribute causes value truncation #570

Closed mikehenry1979-bah closed 1 month ago

mikehenry1979-bah commented 1 month ago

I am using this very cool package v1.11.66 to add attributes to "a" elements in PHP source code files. I am only running across one snag that does not cause an exception. If I parse an element such as: <a target=\"_blank\" href=\"$CFG->wwwroot/course/view.php?id=$course->id\">$shortname ->

Instead of getting $CFG->wwwroot/course/view.php?id=$course->id for the "href" attribute value, I only get $CFG-

Note that those are literal \ and not escape characters. I can work around this by changing the HTML to <a target=\"_blank\" href=\"$CFG->wwwroot/course/view.php?id=$course->id\">$shortname -> before parsing it, but it would be nice if this could be handled since it's contained inside the " pair. I'm guessing it's also a possibility that the \" containment vs " is causing the problem.

mikehenry1979-bah commented 1 month ago

Ah, markdown interfered with my post. Please see attachment. HAP sample.txt

mikehenry1979-bah commented 1 month ago

Also, the attached line does parse properly. HAP sample 2.txt

JonathanMagnan commented 1 month ago

Hello @mikehenry1979-bah ,

Thank you for reporting.

We will look at it very soon.

Best Regards,

Jon

JonathanMagnan commented 1 month ago

Hello @mikehenry1979-bah ,

Do you think you could create a Fiddle or a runnable project with the issue? It doesn’t need to be your project, just a new solution with the minimum code to reproduce the issue.

I just made a simple test and everything is working: https://dotnetfiddle.net/yUfgJW

var document = new HtmlDocument();
document.LoadHtml("<a target=\"_blank\" href=\"$CFG->wwwroot/course/view.php?id=$course->id\">$shortname</a>");

var href = document.DocumentNode.SelectSingleNode("//a").GetAttributeValue("href", "");

Console.WriteLine(href);

(I also tried when reading a file)

So I'm probably missing something from my simple test. Knowing how to get the same issue as you will surely help me.

Best Regards,

Jon

JonathanMagnan commented 1 month ago

Hello @mikehenry1979-bah,

Since our last conversation, we haven't heard from you.

As previously mentioned we would need a runnable project to assist you.

Let me know if you have questions.

Best regards,

Jon

mikehenry1979-bah commented 1 month ago

Apologies, I’ve been pulled from that work for the immediate time. If you were able to get it to work, this issue can be closed.

From: Jonathan Magnan @.> Sent: Thursday, October 3, 2024 8:41 AM To: zzzprojects/html-agility-pack @.> Cc: mikehenry1979-bah @.>; Mention @.> Subject: [External] Re: [zzzprojects/html-agility-pack] ">" inside "a" element "href" attribute causes value truncation (Issue #570)

Hello @mikehenry1979-bah, Since our last conversation, we haven't heard from you. As previously mentioned we would need a runnable project to assist you. Let me know if you have questions. Best regards, Jon — Reply to this email directly,

Hello @mikehenry1979-bahhttps://urldefense.com/v3/__https:/github.com/mikehenry1979-bah__;!!May37g!NUMdpQtsDU8v4umVHtsSelbwoOeDsaZbl1U_YCJdu3nGQlc5l0b6JPB4qw_AnTLFaRFwMQ-fiiqr_Tnu574PdGOK$,

Since our last conversation, we haven't heard from you.

As previously mentioned we would need a runnable project to assist you.

Let me know if you have questions.

Best regards,

Jon

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https:/github.com/zzzprojects/html-agility-pack/issues/570*issuecomment-2391319638__;Iw!!May37g!NUMdpQtsDU8v4umVHtsSelbwoOeDsaZbl1U_YCJdu3nGQlc5l0b6JPB4qw_AnTLFaRFwMQ-fiiqr_Tnu5zNAv7qo$, or unsubscribehttps://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/ATOWFGQ25PCFCVMORTXZRKLZZU3NTAVCNFSM6AAAAABO46CK2GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOJRGMYTSNRTHA__;!!May37g!NUMdpQtsDU8v4umVHtsSelbwoOeDsaZbl1U_YCJdu3nGQlc5l0b6JPB4qw_AnTLFaRFwMQ-fiiqr_Tnu5wICptdv$. You are receiving this because you were mentioned.Message ID: @.**@.>>

JonathanMagnan commented 1 month ago

Hello @mikehenry1979-bah

Perfect, we will close it.

We will re-open it when you will be able to provide an example.

Best Regards,

Jon