highlightjs / highlight.js

JavaScript syntax highlighter with language auto-detection and zero dependencies.
https://highlightjs.org/
BSD 3-Clause "New" or "Revised" License
23.53k stars 3.58k forks source link

CSS, SASS(SCSS), LESS, STYLUS highlighting differences. #2269

Open w3suli opened 4 years ago

w3suli commented 4 years ago

img

A list of things to keep in sync:

See the css_consistency branch.

Sass bug list:

Less bug list:

Stylus bug list:

There is a lot of difference. If I have enough time, I'll make examples one by one. I welcome any help. Thanks in advance for all your help!

joshgoebel commented 4 years ago

I'll make examples one by one

Of each bug with each grammar? I'm not sure that's a great use of time. What really should happen here is to look at how to consolidate all of these, reuse some matchers, merge grammars where possible, etc. That might start to look like more of a ground-level rewrite - and then this exact list of bugs wouldn't be useful at all. It's be more useful to rebuild things on a solid foundation and then see where we stood.

It's possible Less/CSS/SCSS could likely become a single grammar... (or very close)... and SASS and Stylus are very similar also.

Optimally someone (with a lot of time) needs to dig in here and think about this at a very high level. If we can do something like build 5 on 2 core grammars... or 5 on 1... then keeping the features in sync with future improvements becomes much simpler.

Otherwise we could spend a bunch of effort today just to have them get out of sync again in the future.

joshgoebel commented 4 years ago

Your side by side is also potentially not accurate as you used the same source-source code... SASS and Stylus (at least) are an entirely different grammars AFAIK. They don't expect/require brackets or semi-colons, etc... so it's possible some of the issues you're seeing are because they aren't intended to highlight RAW css code, which is what it appears you're using.

To actually test them properly you'd need the same CSS, but converted to each actual language, though some of them might be interchangeable.

joshgoebel commented 4 years ago

Stylus sounds fun:

Stylus uses the .styl file extension and it allows us to write code in many different ways. We can use the standard CSS syntax, but we can also omit brackets, colons, and/or semicolons or leave out all punctuation altogether.

LOL.

joshgoebel commented 4 years ago

I'd really love to avoid having to include ALL the TAGS and ATTRIBUTES if possible.

joshgoebel commented 4 years ago

You might start by looking at which rules are "generic" in the sense that we could have them in a single place (like CSS) and then pull into the other grammars from there. And then CSS would become a requirement of the other grammars and common things (like list of tags, or attributes, if necessary) would live in CSS grammar.

CSS's FUNCTION_LIKE might be one example.

joshgoebel commented 4 years ago

attribute prefix -I like it but different from css and less

Not so much a feature as side-effect because it's only highlighting attributes it's heard of... but this shouldn't probably be "fixed" to highlight the whole attribute like elsewhere since (as discussed elsewhere) we don't really have a class for highlighting prefixes differently.

w3suli commented 4 years ago

Almost everything that came to my mind the last day was said. I looked at the codes and saw that it was possible to set up dependencies. I do not know all css preprocessors well enough. I know Sass and SCSS well. I am less familiar with LESS, STYLUS and PostCSS. As far as I know, all of them use the default css code and add more options. Therefore, they all depend on CSS.

The css preprocessors require thorough study for proper work. the difficulty is how to solve it so that everything is highlighted correctly, without having to specify each language element separately.

It would be good to solve this, but it takes a lot of work. However, if we did, we would get a more compact and efficient code.

In addition, the appearance of css highlights could show a consistent image, which improves clarity and looks nicer.

I would be happy if this could be solved.

w3suli commented 4 years ago

@yyyc514 Is it possible to call a language in another language by modifying its elements?

I started to create scss by rewriting the original css. goes well so far, but in many places you have to go into the original code.

What do you think would be a good solution? The original recognition of css should be retained, since you can use css in scss as well. However, scss adds new elements to css, so the original css code needs to be modified in many places. However, it would be a bad idea to re-include the original css highlighter description everywhere.

It could be possible to highlight multiple languages within a language file if it is possible to resolve the add-on code snippets only for that alias scss stylus etc. switch on. If the CSS language constants from another language are available. You could then inherit these constants by modifying the required parts.

What is possible in hljs?

joshgoebel commented 4 years ago

Take a look at arduino and CPP. One way to do this is to make them all depend on CSS and put all the "core" rules there (as much as makes sense)... then you build all the syntaxes based on those rules... For example if we REALLY need a list of all CSS attributes you'd put it in inside CSS then do:

var cssLang = hljs.getLanguage("css").exports
cssLang.attributeNames
// etc

Or some such...

This might be a lot easier to solve after we switch to modules and the new build system, which is why I wasn't in a rush to solve it. :-)

Honestly I wondered if CSS/SCSS could be a single language... but I"m not sure how feasible that is.

joshgoebel commented 4 years ago

There is also sublanguage support, but I don't think that's what we want to do.

joshgoebel commented 4 years ago

I'd start with VERY small snippets and then make your new grammars work with them, then expand.. and keep iterating... like start with:

body {
  background: green;
}

That is SCSS and CSS. Get it highlighting... then add another layer:

body {
  header {
    background: blue;
  } 
  background: green;
}

Make sure that works for both... then maybe more onto pseudo selectors, or @ rules, etc. I'm assuming trying to start from scratch.

joshgoebel commented 4 years ago

And if you also did Stylus and Less side by side then you'd see how they were all working at once... and if you did all this in a PR we could provide thoughts and advice as you went.

Then after you covered the basics you'd move on to extension specific things like variables, etc... though I guess variables are also part of CSS now too, lol... but you get the point I hope.

w3suli commented 4 years ago

Thanks for the help!

I'm exploring the possibilities. I hope I can put together a single compact solution and avoid duplication.

w3suli commented 4 years ago

@yyyc514 I'm still thinking about what to do with stylus. If you need to add css properties, which solution is good?

Currently used:

border-width|border-top-width|border-top-style|border-top-right-radius|border-top-left-radius|border-top-color|border-top|border-style|border-spacing|border-right-width|border-right-style|border-right-color|border-right|border-radius|border-left-width|border-left-style|border-left-color|border-left|border-image-width|border-image-source|border-image-slice|border-image-repeat|border-image-outset|border-image|border-color|border-collapse|border-bottom-width|border-bottom-style|border-bottom-right-radius|border-bottom-left-radius|border-bottom-color|border-bottom|border

Shorter version:

border-(bottom|top)-(left|right)-radius|border-image-(outset|repeat|slice|source|width)|border-(bottom|left|right|top)-(color|style|width)|border-(bottom|collapse|color|image|left|radius|right|spacing|style|top|width)|border

Most compact version:

border-((bottom|top)-(left|right)-radius|image-(outset|repeat|slice|source|width)|(bottom|left|right|top)-(color|style|width)|(bottom|color|collapse|image|left|radius|right|spacing|style|top|width))|border

The file size can thus be reduced. Because there are many such css properties.

I don't know how much it affects the speed, but it worsens the purity with the introduction of a new css feature. It will be slightly more labor intensive to add a new feature in some cases.

joshgoebel commented 4 years ago

List them all out, the other just makes future maintenance hard. This is what gzip is for (saving bytes)... But I'd do even further and list then in array form now a string... I have a feeling we'll want to use them outside of keywords...

joshgoebel commented 4 years ago

I don't mind being smart for certain individual attribute though, such as for border:

border-(left|right|top|bottom)-(style|radius|width) etc...

That might actually make it easier to see what is going on and doesn't make maintenance harder since it's really part of a single attribute "border", so it's well grouped.

w3suli commented 4 years ago

I also saw a block solution, but there it finally generates a regular expression full of treasures out of the array with extra steps. In your opinion, which solution should be used?

joshgoebel commented 4 years ago

If I was doing it I'd probably do a mix in an array:

PROPERTIES = [
 [string], 
 [string],
 [regex],
 ... 
]

Then you could use regex to describe some of the attributes that lended themselves well to that... or else just an array of literal strings.

joshgoebel commented 4 years ago

Though I still debate if you need an actual list. It seems you could just match something: as a property, no? CSS does just fine now without having any lists at all.

w3suli commented 4 years ago

So let's use it like this: PROPERTIES.join('|')

It's not good this way:

border-(left|right|top|bottom)-(style|radius|width)

In this case, non-existent properties are also formed.

joshgoebel commented 4 years ago

In this case, non-existent properties are also formed.

Properties that will never appear in CSS files anyways. And how much easier to maintain is that that writing out 100 different possibilities? Many of our grammars cheat in the same way.

But again, why not avoid the list completely?

w3suli commented 4 years ago

Yes css is good. The problem is the stylus. The list is important there. The language is so simplistic that the point of simplification is that you know the list. With a stylus, you will surely need a list.

joshgoebel commented 4 years ago

The problem is the stylus.

How so? All the examples I see still have properties ending in :

Oh I see you can also write:

body
  color white

Ugh :-) Still isn't the first word always a property?

w3suli commented 4 years ago

The stylus leaves everything out, so incorrectly marked elements can be formed if there is no list. Unfortunately for Stylus, the list of html members also seems to be necessary :(.

joshgoebel commented 4 years ago

Unfortunately for Stylus, the list of html members also seems to be necessary

I'm not sure this is true, but it MIGHT be. It's definitely easier if you have a list.

Isn't [single word] (, [single word])* always a list of tags?

[spacing][single word] [another word] is a property assignment.

joshgoebel commented 4 years ago

Seems 100% possible:

https://github.com/PrismJS/prism/blob/master/components/prism-stylus.js

If we can avoid a list that also means the syntax is always stays up-to-date... and we don't need to keep adding new CSS/HTML tags to it over the years. That's a big win.

w3suli commented 4 years ago

If possible, I'd like to avoid the list. But I wouldn't get a wrong selection either. The most interesting are the nested elements where you have to decide whether a new selector or a property is the word.

nav
    margin 10px

    ul
        list-style-type none

        > li
            display inline-block

            &.current
                background-color lightblue

or

xy = red
nav
    margin 10px
    ul
        list-style-type none
        > li
            display inline-block
            background xy
            color green
            span
              padding 10px
            &.current
                background-color lightblue

Thanks for the link. I'm investigating the solution.

joshgoebel commented 4 years ago

In that sample every "more than one word" is an property/value pair. The only "edge case" is > li and that's pretty easy to detect as a selector... (since it includes >)

By word I mean "contiguous sequence of characters".

w3suli commented 4 years ago

The prism: As I see it examines the lines above each other. If the indentation of the top row is less than that of the bottom row, it marks the top row as a selector while the property below it becomes.

But you can't do anything with variables. In the declaration, it could still be highlighted for the equal sign. However, the property values and the variables used there are indistinguishable.

This could be solved if hljs was able to retrieve the variable names before the equality sign and store it. Then the variable names could be searched from the properties. Is this possible?

w3suli commented 4 years ago

In the prism, blank lines cause problems. The compiler does not deal with them, but in highlighting the prism will spoil the highlighting if there is an empty line between a selector and a property.

w3suli commented 4 years ago

If none of the items are retracted, there is no highlight. If we write a space character under the fifth element then all five elements will be a selector.

element1
element2
element3
element4
element5

If we write space before the third element but do not write anything behind, then the first and second selectors will be. The third and the rest remain un highlighted. If you put a space after the third element, followed by any characters, except the comma, the third element will become a property element.

element1
element2
 element3
element4
element5

However, when at least two spaces are placed in front of the fourth element, the third element returns to the selector.

element1
element2
 element3 x
  element4
element5

If there is a space before the third and fourth elements, and at least two spaces are placed in front of the fifth element, then the third and fourth elements will become selectors.

element1
element2
 element3
 element4
  element5

Adding a space after the third element and then writing anything outside the comma becomes the third element property. After the fourth element, we write anything, including more spaces or anything else except parentheses. Everything will be highlighted as one element. A long selector element is obtained. The use of parentheses in the fourth element will eliminate the highlighting. Using the parenthesis as a function highlights the word in front of it.

element1
element2
 element3 x
 element4
  element5

The selectors above each other are not highlighted by the prism as a common element.

<span class="token selector">element1
element2</span>

https://prismjs.com/test.html#language=stylus

joshgoebel commented 4 years ago

However, the property values and the variables used there are indistinguishable.

I'm not sure that's so terrible.

This could be solved if hljs was able to retrieve the variable names before the equality sign and store it. Then the variable names could be searched from the properties. Is this possible?

Not yet.

w3suli commented 4 years ago

I would appreciate it if the variables were distinguishable. Now, I don't know yet that the new feature would be useful elsewhere, but maybe it wouldn't be useless to use a regular expression to read a list into an array that could be used in other regular expressions.

It would also be good to eliminate the problem of blank lines.

Unfortunately, I do not know the full depth of the stylus. I haven't used it for serious work so far. Hope there are no more problems. I highlight variables in sass, scss, less. If possible, I will highlight variables in the stylus later.

joshgoebel commented 4 years ago

Oh I forgot the big use for keywords, auto-detect... so n/m... I guess I don't mind if we have lists of tags and properties since that'll make it easier to auto-detect.

Though good luck telling them apart. ;-)

w3suli commented 4 years ago

I seem to miss the features that are included in the prism. The solution that exists to analyze the stylus in the way prism does may not be feasible :(.

I'm not giving up yet. Perhaps there will be a way to work, but you may need some mixed solutions.

joshgoebel commented 4 years ago

I already said lets do the keywords, so we can auto-detect CSS-likes super well. :-)

The prism grammars are often super-simple (if they don't require/link to other prism grammars), but I didn't look at it that closely. Was that one particular part you were struggling with I could take a look? Does it matter if we can use a keyword list?

w3suli commented 4 years ago

The biggest problem is that you can't search in begin. You can search only between begin and end. But using keywords will be no problem. This is probably why keywords were used in the previous solution. The logic of prismjs is different from that of hljs. True, the prism stylus is not perfect in its current form.

joshgoebel commented 4 years ago

The biggest problem is that you can't search in begin.

You can is you use a look-ahead regex or returnToBegin.

The logic of prismjs is different from that of hljs.

Yes, but much of what it does (at a glance) is transferrable, just we don't have ways to express things as simply as they do. If you have a specific example of rule they have that you don't think is doable in Highlight.js please share it and I'll take a look.

But again, I'm fine with keywords now that I consider the value they provide for auto-detect, and the fact that we already have them anyways... we just need to make sure we have one list that the syntaxes can share.

w3suli commented 4 years ago
{
// property lines
  begin: /((?:^|\{)([ \t]*))(?:[\w-]|\{[^}\r\n]+\})+(?:\s*:\s*|[ \t]+)[^{\r\n]*(?:;|[^{\r\n,](?=$)(?!(?:\r?\n|\r)(?:\{|\2[ \t]+)))/m,
  contains: [
      // properties, values, etc.
  ],
},
{
//selector lines
  begin: /(^[ \t]*)(?:(?=\S)(?:[^{}\r\n:()]|::?[\w-]+(?:\([^)\r\n]*\))?|\{[^}\r\n]+\})+)(?:(?:\r?\n|\r)(?:\1(?:(?=\S)(?:[^{}\r\n:()]|::?[\w-]+(?:\([^)\r\n]*\))?|\{[^}\r\n]+\})+)))*(?:,$|\{|(?=(?:\r?\n|\r)(?:\{|\1[ \t]+)))/m,
  contains: [
      // selector tag, selector id, selector class, etc.
  ],
},

If there is no contains, everything is highlighted well. I tried to assign a class to them. So regular expressions seemed good. selected the selector lines and selected the lines containing the properties separately. However, when I wanted to keep searching for contains, then I was no longer successful.

Something went wrong with regular express searches.

joshgoebel commented 4 years ago

You'd use look ahead to do this... so (?=actual regex here). So then your begin would match, but NOT capture anything (and not advance the cursor)... then in your contains you'd have the rules to match the subexpressions.

w3suli commented 4 years ago

I like the regular expressions mentioned above because without knowing the content it can be used to determine that a selector or expression (property and value) is a given line. The only problem is that I couldn't find any further subunits in the contains.

header
    margin 0
    padding 0
    background black
    nav
        color green
        background gray
        border-bottom 2px solid orange
        &.cls
            ul
                li
                    span
                        color white
                        background blue
        ul
            margin 2px
            padding 2px
            list-style none
            li
                color white
                &.selected
                    color red
                &.inactive
                    color brown
                span
                    color blue
                    background orange
        #id
            border 1px dotted gray
            &:hover
                color pink
            strong
                span
                    border-style dashed

kép if:

{
className: 'attribute',
  begin: /((?:^|\{)([ \t]*))(?:[\w-]|\{[^}\r\n]+\})+(?:\s*:\s*|[ \t]+)[^{\r\n]*(?:;|[^{\r\n,](?=$)(?!(?:\r?\n|\r)(?:\{|\2[ \t]+)))/m,

},
{
className: 'selector-tag',
  begin: /(^[ \t]*)(?:(?=\S)(?:[^{}\r\n:()]|::?[\w-]+(?:\([^)\r\n]*\))?|\{[^}\r\n]+\})+)(?:(?:\r?\n|\r)(?:\1(?:(?=\S)(?:[^{}\r\n:()]|::?[\w-]+(?:\([^)\r\n]*\))?|\{[^}\r\n]+\})+)))*(?:,$|\{|(?=(?:\r?\n|\r)(?:\{|\1[ \t]+)))/m,

},

Regular expressions can only distinguish between rows. After separating the rows, the contents of the rows should be analyzed, but so far it has failed properly :(.

<span class="hljs-selector-tag">header</span>
<span class="hljs-attribute">    margin 0</span>
<span class="hljs-attribute">    padding 0</span>
<span class="hljs-attribute">    background black</span>
<span class="hljs-selector-tag">    nav</span>
<span class="hljs-attribute">        color green</span>
<span class="hljs-attribute">        background gray</span>
<span class="hljs-attribute">        border-bottom 2px solid orange</span>
<span class="hljs-selector-tag">        &amp;.cls</span>
<span class="hljs-selector-tag">            ul</span>
<span class="hljs-selector-tag">                li</span>
<span class="hljs-selector-tag">                    span</span>
<span class="hljs-attribute">                        color white</span>
<span class="hljs-attribute">                        background blue</span>
<span class="hljs-selector-tag">        ul</span>
<span class="hljs-attribute">            margin 2px</span>
<span class="hljs-attribute">            padding 2px</span>
<span class="hljs-attribute">            list-style none</span>
<span class="hljs-selector-tag">            li</span>
<span class="hljs-attribute">                color white</span>
<span class="hljs-selector-tag">                &amp;.selected</span>
<span class="hljs-attribute">                    color red</span>
<span class="hljs-selector-tag">                &amp;.inactive</span>
<span class="hljs-attribute">                    color brown</span>
<span class="hljs-selector-tag">                span</span>
<span class="hljs-attribute">                    color blue</span>
<span class="hljs-attribute">                    background orange</span>
<span class="hljs-selector-tag">        #id</span>
<span class="hljs-attribute">            border 1px dotted gray</span>
<span class="hljs-selector-tag">            &amp;:hover</span>
<span class="hljs-attribute">                color pink</span>
<span class="hljs-selector-tag">            strong</span>
<span class="hljs-selector-tag">                span</span>
<span class="hljs-attribute">                    border-style dashed</span>

I assigned the two classes to regular expressions just for the sake of demonstration.

w3suli commented 4 years ago

Interestingly, when I start a sub language, internal search works flawlessly. This will cause the sub-language markup system to function correctly within the lines of that regular expression.

joshgoebel commented 4 years ago

As I said you need to use lookahead expressions.

https://www.regular-expressions.info/lookaround.html

So for an attribute you match:

(?=[expression for attribute)

That will result in a 0 length match... your contains block will still be starting at the beginning of the match. Trivial example:

margin 2px
start: /(?=margin \d(px))/
contains: [
  { start: "margin", className: "attribute"},
  { start: /\d(px)/, className: "numeric"}
]
w3suli commented 4 years ago

These are just line selectors. This allows you to separate line items with different contents. Otherwise, two different line types could not be separated.

start: /(?=margin \d(px))/
contains: [
  { start: "margin", className: "attribute"},
  { start: /\d(px)/, className: "numeric"}
]

Result: image

If I understand it, do you suggest the old solution? Without separating the rows, it is also necessary to include the html members. At least with the stylus.

joshgoebel commented 4 years ago

I do not understand what you are asking. I was merely trying to answer:

The only problem is that I couldn't find any further subunits in the contains.

Oh, perhaps now I get it. Any mode will end immediately when it fails to find additional matches. So you'd need "dummy" rules here to eat up the extra whitespace between matches... if you were trying to match groups...

That's why I don't think you honestly want to try to keep track of "parents" and "children"... just process each line one at a time, figure out what type of line it is, and highlight it appropriately.

joshgoebel commented 4 years ago

So only attribute would have a contains...

joshgoebel commented 4 years ago

And actually to do a "sequence" of things you might need to chain rules with starts rather than using contains at all... we don't have a syntactically easy way to do chains currently.

joshgoebel commented 4 years ago
w3suli commented 4 years ago

Very simple example: https://jsfiddle.net/w3suli/ygj3t5ed/10/

What should I do here (// attributes, values, and variables ...) to analyze the content? What should I do here (// selectors and variables ...) to analyze my content? So far, none of my ideas have worked properly.

joshgoebel commented 4 years ago

Just off the top of my head:

https://jsfiddle.net/kp7ts8Lc/