Add a way to get a language given an alias

highlightjs / highlight.js

JavaScript syntax highlighter with language auto-detection and zero dependencies.

https://highlightjs.org/

BSD 3-Clause "New" or "Revised" License

23.31k stars 3.52k forks source link

Add a way to get a language given an alias #3938

Closed nmangold closed 3 months ago

nmangold commented 6 months ago

Is your request related to a specific problem you're having? I am using CKEditor 5, which allows code blocks in languages such as HTML. An HTML code block will produce the class language-html. When importing the language to be highlighted, html.min.js does not exist. To highlight an HTML code block, the user would be required to select an XML code block instead. Unless, the application maintains a mapping of aliases and languages.

The solution you'd prefer / feature you'd like to see added... A method that when given an alias, returns the appropriate language.

joshgoebel commented 6 months ago

highlight(code, {language: "html"}) (or the dynamic equivalent) should work fine if XML is already registered - you only need to know the alias... or are you saying you're trying to start with nothing and load grammars on the fly?

nmangold commented 6 months ago

Start with nothing, and load grammars on the fly.

joshgoebel commented 6 months ago

I don't think this makes sense in the core no-grammars highlight.js library file - since that core includes no grammars - and when you build with bundled grammars this information is already available to query. It could make sense to add as some sort of separate grammars.json index file though (that could ship with our node and CDN packaging).

This isn't 100% trivial though since the aliases aren't available statically [at build-time] - you'd need to build then execute the built grammars to learn their aliases. That might not be so bad though with dynamic imports, I'm not sure.

Are you interested in working on a PR that might build such an index file?

nmangold commented 6 months ago

I am not very familiar with this project yet, and I do not know the first place to start. I was hoping there was already a way to do such a lookup, and I just couldn't find the documentation.

The approach I am using is to load the no-grammars library file, and then only the grammars that are needed via the CDN. I assume the solution you're suggesting to include the mapping file in the CDN is the only way I will be able to do that. Is there a better approach I could be taking?

Otherwise, I could make an attempt at a PR. It sounds like I would need to build each grammar, extract the aliases, and build the file? Sorry if this doesn't make sense. I am still very confused myself.

joshgoebel commented 6 months ago

... sounds like I would need to build each grammar

The build system does this already.

extract the aliases, and build the file?

Yep. But extraction requires executing each grammar since they are code, not raw data...

joshgoebel commented 3 months ago

It seems either we need to run the grammars at built time (which seems icky), or make the aliases static meta-data in the comment header... but that would break the API (removing aliases from the runtime grammars) or require us to duplicate the alias data both dynamically and statically - also icky.

I don't think this is as trivial as it seems and also seems like an edge case. Are you interested in doing the work on this yourself - and accepting it might end up as a 3rd plugin plugin if I feel the complexity is too high? Otherwise I'll probably close this as a #wontfix for now.