highlightjs / highlight.js

JavaScript syntax highlighter with language auto-detection and zero dependencies.
https://highlightjs.org/
BSD 3-Clause "New" or "Revised" License
23.35k stars 3.55k forks source link

Discuss: API for adding extra keywords to a language? #3493

Open mmattel opened 2 years ago

mmattel commented 2 years ago

Note, this issue has been created from a comment in https://github.com/highlightjs/highlight.js/issues/3489

Inspired by: https://github.com/highlightjs/highlight.js/issues/1271#issuecomment-243354193

I have added some code to add custom keywords to the bash language at keywords.built_in post the last register entry which is in my environment: hljs.registerLanguage('yaml', require('highlight.js/lib/languages/yaml'))

const kwds = ['apt', 'apt-get', 'pecl',
'phpenmod', 'phpdismod', 'a2enmod', 'a2dismod', 'a2ensite', 'a2dissite',
'systemctl', 'service', 'sudo', 'wget', 'tar', 'mysql', 'php', 'grep'];

var built_in = hljs.getLanguage('bash').keywords.built_in;
hljs.getLanguage('bash').keywords.built_in = [...new Set([...built_in ,...kwds])];

Note in the above way, key deduplication is used. This means you can add keywords without worrying. highlightjs may add over time new keywords and using array depuplication avoids any possible conficts.

I would be very happy if there would be a documentation how to add extra keywords. Finding how this is done drove me nuts... You could use my code straight away if you like. It benefits that is does key dedup in case someone adds a key accidentially a second time compared to the existing one. The example is for the bash language. If one wants that for another language, he just needs to have a look into the original to identify the correct (sub)keyword array. Maybe to add in the wiki?

Additional note: I would be very happy if there would be a documentation how to add extra keywords. Finding how this is done drove me nuts...

I'd recommend our Discord for such questions the future though, might save you some time. A few thoughts:

I'd suggest this might be a larger discussion about creating some API for this in a broader sense... but (correct me if I'm mistaken) this seems a very small edge case and thus not something that would provide much benefit for users. It seems really it only applies to shell, bash, etc... 2 or 3 grammars out of 200.

Originally posted by @joshgoebel in https://github.com/highlightjs/highlight.js/issues/3489#issuecomment-1050958905

mmattel commented 2 years ago

@joshgoebel You are totally right that this affects not that many languages but I would like to document the driver why such a possibility should be documented respectivley, as you said, be implemented by a new API.

Yes, someone who wants to add own keywords to a language MUST take care about WHICH keword or subkeyword is necessery to be referenced, but this is imho part of the documentation.

mmattel commented 2 years ago

Added a Wiki entry as suggested 😃

Let me know if you have comments.

If you are fine, you can close this issue.

joshgoebel commented 2 years ago

Note the way how the array is built, as it will deduplicate newly added keys.

This isn't really necessary as this happens internally anyways.

I tweaked your wording slightly in several places and cleared up naming... the lists keys are scopes... we have sub-modes or nested modes and they all have their own scopes but not really "sub scopes"... or perhaps in say meta.prompt prompt is a sub scope, but that's not what we're talking about here. Sorry if I confused you earlier.

I used push(...) for modifying the array (in place) as that should be more compatible with more grammars.

joshgoebel commented 2 years ago

When looking into bash... he wanted to add additional kewords to mathematica which I think is another fair reason.

This is a very small number of grammars still, but I suppose you could argue "any language that allows defining custom functions should allow adding those functions to a list to be highlighted"... I think for many languages (with a dispatch syntax we can recognize) this would be better solved by #2500 though - highlighting ALL custom functions as function.title... sadly that isn't going to work with bash though.

So while I don't dispute the usefulness in some cases, this still seems a niche problem.

...add additional keywords to mathematica...

Worth noting: The approach in that thread would no longer work since Mathematic was entirely rewritten and now does not use keywords, it uses custom modes - hence going back to one of my caveats - that grammars are free to highlight code many different ways, not all of it compatible with the easily adding custom keywords.

If we were to consider an API I'd expect we'd want it at the library level:

hljs.addCustomKeywords("bash", 
{
  keywords: // ...
  built_ins: // ...
}

That implies it would need to work with ALL languages (or at the very least all ~200 1st party languages we ship with the library)... this would force us to redesign many grammars just to allow this... and I wouldn't want to support the feature at the API level that just randomly worked or didn't depending on the grammar design...

This means for any grammar without top-level keywords we'd need some type of runtime hooks (since it'd be strange to have an API that can only be called at startup - that would be a strange precedent and lead to support issues) per-grammar to allow individual grammars to tweak their own keyword engines... and now that starts to add complexity to the compiler as well... since grammars are pretty much "pure data"... the Highlight.js engine "compiles" them before use... adding keywords after startup (I modified your article to say "startup" not "runtime") requires modifying the original data and then requesting the grammar be recompiled.

This sounds like a lot of effort/complexity for a very minimal win. If someone wanted to work on this and show me it was super simple/easy to solve all these problems without adding a lot of code/complexity that requires future maintenance then I'd be curious, but I'm just not seeing it right now.

joshgoebel commented 2 years ago

Bullet list of issues we'd need to solve:

Those are the ones I'm thinking of at least. :)

stephencmorton commented 1 year ago

It's pretty common in C projects to have custom typedefs for basic C types. It would be nice to be able to have these project-standard types added to highlighting.

This is just an example of one I've run accross:

typedef signed char             tInt8;
typedef unsigned char           tUint8;
typedef short                   tInt16;
typedef unsigned short          tUint16;
typedef int                     tInt32;
typedef unsigned int            tUint32;
mbomb007 commented 6 months ago

I'd like to see sudo added to the default Bash keywords. It's odd that it's not highlighted.