jstedfast / HtmlKit

A cross-platform .NET framework for parsing HTML
Other
81 stars 55 forks source link

<xmp> tags are parsed with HtmlTagId.Unknown #38

Open firat-plutoflume opened 4 days ago

firat-plutoflume commented 4 days ago

Describe the bug A clear and concise description of what the bug is.

Hello,

I was trying to parse some files with <xmp> tags in them and noticed when I use the HtmlTokenizer it extracts an HtmlTagToken with HtmlTagId.Unknown. I see in the HtmlTagId enum there is a value for xmp tags, is this intentional and if so, could you tell me if there are any other tags that have the same behaviour please?

Thanks in advance

Platform (please complete the following information):

To Reproduce Steps to reproduce the behavior:

you can use this simple example below:

<xmp>test</xmp>

here is what I see in the debugger: image

Expected behavior A clear and concise description of what you expected to happen.

Code Snippets If applicable, add code snippets to help explain your problem.

// Add your code snippet here.

Additional context Add any other context about the problem here.

jstedfast commented 4 days ago

Looks like a bug.

jstedfast commented 4 days ago

I can't reproduce the bug shrug

jstedfast commented 4 days ago

Looks like I have some changes that were never pushed in a release. Can you try the latest HtmlKit source code and see if it fixes things for you? If so, I'll try to make a new release.

firat-plutoflume commented 4 days ago

yes, it works on the latest code, thanks!

I've noticed that MimeKit.Text.HtmlTokenizer also works correctly, we are currently using both libraries, so can we maybe use MimeKit.Text instead of having HtmlKit as a separate dependency? I believe that would be more 'up-to-date' compared to HtmlKit, is that right or are there any differences between?

jstedfast commented 3 days ago

Yes, MimeKit imports all of HtmlKit into the MimeKit.Text namespace (only difference is the root namespace).

I had originally implemented it for MimeKit and then later thought maybe it would have some usefulness outside of MimeKit so I made HtmlKit.

firat-plutoflume commented 3 days ago

yes it has definitely uses as a separate library, but in our case I think it's easier/cleaner to just use MimeKit alone. thanks for quick response and help 👍