Enable Keeping Attribute Name Casing

EisenbergEffect commented 9 months ago

Some libraries/frameworks would like to use Parse5 in a way that preserves the casing of attribute names. A common use case for this is a templating engine that enables binding to custom events or JS properties (which natively allow for both lower and upper case).

As far as I can tell, this is controlled by the Tokenizer's _stateAttributeName method, which can be found here:

https://github.com/inikulin/parse5/blob/master/packages/parse5/lib/tokenizer/index.ts#L1673

Would it be possible to add a feature that tells this method to not lower case every letter?

Alternatively, is there a way to create a custom tokenizer that simply overrides this protected method with the altered behavior? If there is, I just need a code sample for that. Currently, I'm monkey patching the Tokenizer prototype directly, which I'd prefer not to do.

Thanks for the help!

tenadolanter commented 8 months ago

Have encountered the same problem.

I use a mapping to store tags and attributes with uppercase letters, and then replace them after conversion, but I don't think this is a good way.

EisenbergEffect commented 5 months ago

Can anyone from the parse5 team comment? I'm happy to put together a PR to enable this setting if it would be accepted and someone could guide me as to the preferred way to add it.

wooorm commented 5 months ago

This is an HTML parser (following the WHATWG spec). Not a custom language parser. Most of the closed issues are people asking similar questions: https://github.com/inikulin/parse5/issues?q=is%3Aissue+sort%3Aupdated-desc+is%3Aclosed. So I don’t think this will be accepted.

Particularly this seems a duplicate of https://github.com/inikulin/parse5/pull/221.

To me it seems likely that this is an XY question. Perhaps I can help you better if I know more about your root problem.

EisenbergEffect commented 5 months ago

I am implementing a compiler. It takes HTML as input and produces a reusable template that can be bound to data. Think of just about any front-end rendering library as an example.

The HTML needs to include bindings to attributes, properties, and events. While attributes are always case normalized in HTML, the properties and events fired by the underlying Node instances can have any combination of casing. So, the templating language needs to be able to support that. For example:

<my-element :someProperty="{{this.someOtherProperty}}">

When parsing the HTML, we don't want the parser to return :someproperty because that is not the JS property name. We need it to preserve the casing, so we can get :someProperty.

We do not know the properties and events ahead of time, so we cannot correct the casing automatically. We need the casing preserved. Other than this, everything is normal HTML.

This is why I requested an option to have the parser not normalize the casing of attributes. I have monkey patched this for now, but would prefer not to have to take that approach. I would also prefer not to have to fork or write a parser from scratch just to essentially remove one invocation of toAsciiLower().

The community apparently brings this up often, so it seems like a legitimate broad need. It doesn't seem like it would be a lot of work to implement and could remain completely backwards compatible. Only those who want this would opt-in.

wooorm commented 5 months ago

The community apparently brings this up often, so it seems like a legitimate broad need

Maybe. But it’s also free software. It’s legitimate for people to not want to maintain the things other folks want. To not do everything every user ever wants. We also get folks wanting to pass Vue files through. Or folks who want <div/> to be closing. That’s all also out of scope. You can use patch-package or fork if you want.

Personally, maintaining a lot of parsers, particularly around the markdown space, for years, I’m very strongly on sticking with the specs and not allowing deviations. Especially for mature languages/projects.

I’d recommend either: a) fork / patch-package / build a new parser b) use XML c) use JSX d) use actual HTML

I’d bet on the popular JSX or HTML.

43081j commented 5 months ago

@EisenbergEffect you can at least use location info to get hold of it:

const source = '<div casedAttribute="abc"></div>';
const frag = parseFragment(source, {
  sourceCodeLocationInfo: true
});
const div = frag.childNodes[0];
const {startOffset, endOffset} = div.sourceCodeLocation.attrs.casedattribute;
source.slice(startOffset, endOffset); // casedAttribute="abc"

EisenbergEffect commented 5 months ago

@wooorm I've been doing open source for 20 years, so I totally get it. You need to do what's best for your project.

For my part, I don't want JSX or XML. There's not a great way to use actual HTML for this purpose without introducing a fairly verbose syntax. I've already patched things, and have everything working. I just wanted to explore whether there was a better way.

@43081j That may do the trick. I'll give it a try. Thanks!

wooorm commented 5 months ago

actual HTML for this purpose without introducing a fairly verbose syntax

One idea: the dataset api in html has a similar "problem". It is solved there by dash vs camelcase. So the data attributes are data-foo-bar, which corresponds to the property dataset.fooBar.

There is also the question of whether it would be good to support properties when writing attributes. Preact/Vue never did, always going with attributes. React had until V19 just now a huge problem adding support for custom elements and more because they went the property route. So perhaps sidestepping the problem may be better

EisenbergEffect commented 5 months ago

The fact of the matter is that DOM nodes, both built-ins and custom, have properties that need to be manipulated. Supporting attributes only would be a major problem. Using data- isn't great either because that creates a real mismatch/confusion with respect to the actual properties that are targeted. They aren't data properties at all. We could say that any attr with the : prefix should follow the data- casing conversion pattern. That's not terrible, but it adds more cognitive load on the template author, which isn't great.

I appreciate the thoughts.

inikulin / parse5

Enable Keeping Attribute Name Casing #1128