Closed bytecode77 closed 1 month ago
Hello @bytecode77 ,
Looking at Firefox and Chrome, I believe the current actual result from HAP is the right one:
FireFox:
The "comment" section end at the first occurrence of the >
found. This means the end tag </a>
is considered part of the HTML, but an end tag alone cannot really exist, so it gets removed.
I'm not exactly sure of all the rules that is currently applied, but the current actual result is indeed what I'm expecting.
Let me know if that explains the reason correctly.
Best Regards,
Jon
Thanks for your quick response Jonathan!
For a comment, I think the end tag is not useful to the DOM. However, the code does not represent a comment:
<code><![CDATA[<a href=\"foo\">\r\nbar\r\n</a>]]></code>
Specifically, this code was retrieved by the Confluence API when reading Confluence pages. I'm not sure what the purpose of <![CDATA[
here is, though.
However, the code does not represent a comment:
Indeed, you are right. I just assumed this, looking at the result, but that's not the case.
I never had to really use the <![CDATA[
as far as I remember, but looking at what I see on internet, it looks more related to be used within a script
tag but not exclusively to this.
At this moment, it still looks like the current behavior looks more like the normal behavior unless I'm proving wrong. Again, I'm not familiar with this tag, so I could definitely be wrong.
Best Regards,
Jon
This is the original HTML that is a Confluence page export that I'm parsing.
It does contain a <![CDATA[
within the text-body of a <ac:structured-macro ac:name="code"
and it represents a Confluence code box.
Yes, it was used within <script>
tags way before javascript was common to not offend non-supporting browsers. But it remains valid HTML that is, indeed, used:
<ac:structured-macro ac:name="code" ac:schema-version="1" ac:macro-id="70aacf91-111a-4b25-8c3c-543aa6fd0af9">
<ac:plain-text-body>
<![CDATA[<a href="foo" target="_blank">
bar
</a>]]>
</ac:plain-text-body>
Hello @bytecode77 ,
Thank you for the additional info. I have looked at the HAP code, and the <!CDATA[
tag is not supported. The current behavior is more a combination of "Comment" and "Text" nodes that show the same result as Firefox.
I would not like to change the current default behavior, but I'm open to looking more at it to support it the way you want through an option that you will need to enable.
I should be able to look more at it later this week
Best Regards,
Jon
Thanks for looking into it, Jon!
I'll stay tuned for your updates :)
Hello @bytecode77 ,
A new version has been released today: https://github.com/zzzprojects/html-agility-pack/releases/tag/v1.11.68
Could you try the new option and let us know if everything is working as expected.
Best Regards,
Jon
Hi Jonathan!
thank's for providing a new version! However, unfortunately the result is the same as before, see the dotnetfiddle that I posted initially.
To provide a more concrete example: When importing HTML from Confluence, this is what the HTML string looks like:
...
<ac:plain-text-body>
<![CDATA[<iframe src="xxxxxx" width="600" height="1000" frameborder="0">
<a href="xxxxx" target="_blank">
xxxxx
</a>
</iframe>]]>
</ac:plain-text-body>
...
It's a code box showing some HTML.
However, the InnerHtml
property is missing the closing tag:
<![CDATA[<iframe src="https://www.terminland.de/hno/?mode=frame" width="600" height="1000" frameborder="0">
<a href="https://www.terminland.de/hno" target="_blank">
Online-Terminbuchung
</a>
]]>
Hello @bytecode77 ,
Fiddle currently still uses the previous version of HAP and not the latest one (don't ask me why!).
You need to turn on the following options to make it works:
HtmlDocument document = new HtmlDocument();
document.OptionThreatCDataBlockAsComment = true;
I tested both your examples (the first one and the one you just posted), and both seem to work very fine with this option.
Best Regards,
Jon
Ah, I see. However, unfortunately the option OptionThreatCDataBlockAsComment
is not included in your latest build. I confirmed this by installing 1.11.68 in a completely blank project on another computer to make sure that this isn't a nuget cache issue of some sort.
Oh god my bad, It looks like I did a bad release!
The v1.11.69 is now available. I double-checked to ensure I had not made the same mistake twice 😆
Best Regards,
Jon
Let me know if this version was working
Works like a charm! Thanks for your support, Jonathan, and keep up the nice work. I really love HtmlAgilityPack and I've been using it for many years.
Bye!
You're right, wo80, didn't notice :D
@JonathanMagnan I think if you change the typo and break upward compatibility, there is only one person (me) who would be affected, but that's no problem. When I update some time in the future I will just change the wording on the calling site, too.
Do'h! You are 100% right @wo80
~It will be changed the next time we release HAP, which will probably happen next month.~
Hello,
The v1.11.70 is now available with the option renamed as proposed by @wo80 ;)
Best Regards,
Jon
1. Description
Closing tag is missing within
<![CDATA
object.Background: This issue surfaced when parsing HTML
<code>
blocks. The HTML content is given by Confluence, where the HTML is parsed from.3. Fiddle or Project
https://dotnetfiddle.net/XmcOB6
Expected Result
Actual Result
4. Any further technical details