highlightjs / highlight.js

JavaScript syntax highlighter with language auto-detection and zero dependencies.
https://highlightjs.org/
BSD 3-Clause "New" or "Revised" License
23.32k stars 3.52k forks source link

JavaScript/Typescript Upper_Snake_Case issues #4072

Open jkavalec opened 4 days ago

jkavalec commented 4 days ago

Describe the issue Wen you have non standard code style - using Upper Snake case in this instance - most types of Ids except for variable that is assigned arrow style function, have highlight that is composed of two colors, rendering only first part of Id correctly.

Which language seems to have the issue? TypeScript/JavaScript

Are you using highlight or highlightAuto? I noticed this issue on StackOverflow and ChatGPT and would like to see naming style I use highlighted properly

Sample Code to Reproduce

type Only_First_Is_Highlighted = any

const idCamel = "no highlight for basic id"
const snake_camel = "also as expected"
const PascalCase = "here is different Highlight"
const Upper_Snake = "this again breaks"
const SCREAMING_combined = "this is not highlighted as basic id"
const SCREAMING_SNAKE = "this is as ist should be"

const dict = {
    camelKey: "works",
    PascalKey: "different color, but still not broken",
    Upper_Snake_Case: "broken",
}

// everything in this case is consistend id is identified as
// const function and colored same way
const camelCaseFnWorks = () => {}
const snake_works = () => {}
const Upper_Snake_Works = () => {}
const SCREAMING_SNAKE = () => {}

//all below are broken
type Upper_Snake_Case = (min: number, max: number) => number

const Upper_Snake_Case: Upper_Snake_Case = function(min, max) {}

const SCREAMING_SNAKE: Upper_Snake_Case = function(min, max) {}

const Upper_Case = function(min, max) {
    return Math.floor(Math.random() * (max - min + 1)) + min;
}

// lower snake case here is consistent with camel case

const lower_snake_case = function(min, max) {}
const camelCase = function(min, max) {}

const lower_snake_case: Upper_Snake_Case = function(min, max) {}
const camelCase: Upper_Snake_Case = function(min, max) {}

Here github code for instance has no broken Upper Snake Case

image

Expected behavior Single Id of any kind should not be colored in 2 separate colors.

joshgoebel commented 4 days ago

I think we'd be open to a simple patch to NOT highlight identifiers Upper_Snake_Works. This is no type of idiomatic JS I've ever seen. If you have a pointer to some specific guides I'd be willing to check them out. I checked out the GitHub guide just quickly and it had no mention I saw of this time of mix of camel case and snake case.

It looks like Github is tagging this smi and not highligthing it the same as the other variables. Do we know what smi is?

jkavalec commented 4 days ago

I updated examples a little bit. I was unable co find what acronym or word smi means in this contexts, could you please explain?

If I understand correctly you would want for instance that upper snake case that correctly colors const function not to be colored after patch? Having no broken highlight is better than having colors, but that would basically mean having no highlighter at all. I would expect the highlighter just to finish what it started in color of first word.

Some context for this non idiomatic JS. I am currently developing language, and JS platform is what I ended up with as first implementation. I want to be able to transform between my language and TypeScript and casing I chose is snake case all the way. I plan first to release TypesScript based libraries, that are all snake case. Since Highlight.js is most popular highlighter I have issues if I paste non standard JS/TS into either StackOverflow or ChatGPT since it highlights the code incorrectly.

Now if I were to look into this, I would need to understand what may cause these inconsistencies

For me the expected behavior would be that if highligter takes in token it finishes the job on the id in color it started with. The first word is consistent if it is both camelCase or snake_case - both upper and lower, so I wonder what is the reason this happens.

I would be ok to look into the code and solve it myself if we could think how this would be reasonable to fix and if I could understand what could cause the problem.

joshgoebel commented 2 days ago

for this non idiomatic JS.

Exactly. We don't care much about highlighting non idiomatic JS "correctly" or that we poorly highlight non idiomatic JS. We're solving for the most common use cases here. However, as I mentioned, if a simple enough patch was proposed to prevent poor highlighting of these mixed Camel+Snake case we'd likely accept it. Probably a matter of matching on a word boundary in addition to a negative lookahead for the underscore.

We highlight only valid syntax and very popular idiomatic patterns that are common.

The highlighting is all based on regex patterns. You could indeed highlight these weirdly named variables/constants if you wrote a custom grammar to do so and wrote the appropriate regex. If you want to modify grammars or buid your own I'd suggest reading our docs and then becoming familiar with regex patterns for describing matches.

jkavalec commented 22 hours ago

Ok, I will let you know what I find about it and whether there is some simple solution, I started go through some code and will look more into how this all works internally, thank you for comment.