zzzprojects / html-agility-pack

Html Agility Pack (HAP) is a free and open-source HTML parser written in C# to read/write DOM and supports plain XPATH or XSLT. It is a .NET code library that allows you to parse "out of the web" HTML files.
https://html-agility-pack.net
MIT License
2.65k stars 375 forks source link

Unable to set "standalone" boolean attributes #516

Open jekru opened 1 year ago

jekru commented 1 year ago

1. Description

I am unable to set "standalone" boolean attributes. Getting the value of these attributes results in an empty string. Also setting such attributes is only possible by using a string. While this is not false because only the presence of the attribute matters, I find it a bit odd to use empty strings to set such attributes. For example checked will be true for <input type="checkbox" checked /> and <input type="checkbox" checked="" /> and even for <input type="checkbox" checked="false" /> because only the presence of the attribute matters.

See also: Boolean attribute (HTML).

2. Exception

There is no exception.

3. Fiddle or Project

https://dotnetfiddle.net/r7jvX7

4. Any further technical details

Creating Attributes.Add(string name, bool value) and SetAttributeValue(string name, bool value) could help differentiate between standalone boolean attributes and attributes with a boolean string to allow creating standalone boolean attributes.

elgonzo commented 1 year ago

It is possible to set a value-less attribute, however it is a bit awkward and unintuitive as HtmlAgilityPack is not providing a simple convenient API for that.

Basically, currently you gotta do something like this:

static HtmlAttribute SetNoValueAttribute(HtmlNode element, string attributeName)
{
    var attr = element.Attributes[attributeName]
        ?? element.Attributes.Append(attributeName);

    attr.Value = null;
    attr.QuoteType = AttributeValueQuote.WithoutValue;
    return attr;
}

SetNoValueAttribute(iframeNode, "allowfullscreen");
jekru commented 1 year ago

It is possible to set a value-less attribute, however it is a bit awkward and unintuitive as HtmlAgilityPack is not providing a simple convenient API for that.

Thanks for your response and code example.

@JonathanMagnan Even if it is possible to create such value-less attributes, can we discuss about my suggestion to allow attributes to have a boolean value?

JonathanMagnan commented 1 year ago

Hello @jekru ,

What exactly are you expecting from your proposed method Attributes.Add(string name, bool value) to do?

  1. <div checked></div>
  2. <div checked="true"></div>
  3. <div checked="false"></div>

One solution, meanwhile, is surely creating an extension method on your side using @elgonzo .

From what I understand, you would like it to do this answer <div checked></div> but having a method with a bool parameter that do nothing will add more confusion than anything else as someone not knowing this issue would expect that it will generate the solution 2 or 3 depending of the value.

Best Regards,

Jon

jekru commented 1 year ago

I would expect that Attributes.Add(string, name, bool value) respectively SetAttributeValue(string name, bool value) would result in <div checked></div> (when using "checked" as name and true as value). But you are right, this could let to confusion because someone could interpret a boolean value to result in <div checked="true"><div>.

The only problem I see is that there is no way to identify value-less attributes. Here is an example where I would expect null as Value and AttributeValueQuote.WithoutValue as QuoteType:

using System;
using System.Linq;
using HtmlAgilityPack;

public class Program
{
    public static void Main()
    {
        var input = HtmlNode.CreateNode("<input checked></input>");
        var checkedAttribute = input.Attributes.First();

        // Result is: Value: '' (empty string)
        Console.WriteLine($"Value: '{checkedAttribute.Value}'");

        // Result is: QuoteType: DoubleQuote
        Console.WriteLine($"QuoteType: {checkedAttribute.QuoteType}");
    }
}
JonathanMagnan commented 1 year ago

Hello @jekru ,

Something for sure is that by default, we prefer to keep the current empty with the double quote behavior for backward compatibility. Browsers have different behavior on value-less attributes (firefox in developer mode will do the "empty string" like us while Chrome will keep it without value). Which browser is right? I'm not sure of the answer but keeping being backward compatible in this case is probably the best.

jekru commented 1 year ago

@JonathanMagnan

You made a good point with backward compatibility and I also was not aware that the behavior differs from browser to browser. What about creating a function IsBooleanAttribute() or IsValueLessBooleanAttribute() that only returns true if the attribute is a standard html attribute and defined as boolean attribute f.e. "checked" or it is a non standard html atttribute and is without a value (also not an empty string)?