Open w3suli opened 4 years ago
I'll make examples one by one
Of each bug with each grammar? I'm not sure that's a great use of time. What really should happen here is to look at how to consolidate all of these, reuse some matchers, merge grammars where possible, etc. That might start to look like more of a ground-level rewrite - and then this exact list of bugs wouldn't be useful at all. It's be more useful to rebuild things on a solid foundation and then see where we stood.
It's possible Less/CSS/SCSS could likely become a single grammar... (or very close)... and SASS and Stylus are very similar also.
Optimally someone (with a lot of time) needs to dig in here and think about this at a very high level. If we can do something like build 5 on 2 core grammars... or 5 on 1... then keeping the features in sync with future improvements becomes much simpler.
Otherwise we could spend a bunch of effort today just to have them get out of sync again in the future.
Your side by side is also potentially not accurate as you used the same source-source code... SASS and Stylus (at least) are an entirely different grammars AFAIK. They don't expect/require brackets or semi-colons, etc... so it's possible some of the issues you're seeing are because they aren't intended to highlight RAW css code, which is what it appears you're using.
To actually test them properly you'd need the same CSS, but converted to each actual language, though some of them might be interchangeable.
Stylus sounds fun:
Stylus uses the .styl file extension and it allows us to write code in many different ways. We can use the standard CSS syntax, but we can also omit brackets, colons, and/or semicolons or leave out all punctuation altogether.
LOL.
I'd really love to avoid having to include ALL the TAGS and ATTRIBUTES if possible.
You might start by looking at which rules are "generic" in the sense that we could have them in a single place (like CSS) and then pull into the other grammars from there. And then CSS would become a requirement of the other grammars and common things (like list of tags, or attributes, if necessary) would live in CSS grammar.
CSS's FUNCTION_LIKE
might be one example.
attribute prefix -I like it but different from css and less
Not so much a feature as side-effect because it's only highlighting attributes it's heard of... but this shouldn't probably be "fixed" to highlight the whole attribute like elsewhere since (as discussed elsewhere) we don't really have a class for highlighting prefixes differently.
Almost everything that came to my mind the last day was said. I looked at the codes and saw that it was possible to set up dependencies. I do not know all css preprocessors well enough. I know Sass and SCSS well. I am less familiar with LESS, STYLUS and PostCSS. As far as I know, all of them use the default css code and add more options. Therefore, they all depend on CSS.
The css preprocessors require thorough study for proper work. the difficulty is how to solve it so that everything is highlighted correctly, without having to specify each language element separately.
It would be good to solve this, but it takes a lot of work. However, if we did, we would get a more compact and efficient code.
In addition, the appearance of css highlights could show a consistent image, which improves clarity and looks nicer.
I would be happy if this could be solved.
@yyyc514 Is it possible to call a language in another language by modifying its elements?
I started to create scss by rewriting the original css. goes well so far, but in many places you have to go into the original code.
What do you think would be a good solution? The original recognition of css should be retained, since you can use css in scss as well. However, scss adds new elements to css, so the original css code needs to be modified in many places. However, it would be a bad idea to re-include the original css highlighter description everywhere.
It could be possible to highlight multiple languages within a language file if it is possible to resolve the add-on code snippets only for that alias scss stylus etc. switch on. If the CSS language constants from another language are available. You could then inherit these constants by modifying the required parts.
What is possible in hljs?
Take a look at arduino and CPP. One way to do this is to make them all depend on CSS and put all the "core" rules there (as much as makes sense)... then you build all the syntaxes based on those rules... For example if we REALLY need a list of all CSS attributes you'd put it in inside CSS then do:
var cssLang = hljs.getLanguage("css").exports
cssLang.attributeNames
// etc
Or some such...
This might be a lot easier to solve after we switch to modules and the new build system, which is why I wasn't in a rush to solve it. :-)
Honestly I wondered if CSS/SCSS could be a single language... but I"m not sure how feasible that is.
There is also sublanguage support, but I don't think that's what we want to do.
I'd start with VERY small snippets and then make your new grammars work with them, then expand.. and keep iterating... like start with:
body {
background: green;
}
That is SCSS and CSS. Get it highlighting... then add another layer:
body {
header {
background: blue;
}
background: green;
}
Make sure that works for both... then maybe more onto pseudo selectors, or @
rules, etc. I'm assuming trying to start from scratch.
And if you also did Stylus and Less side by side then you'd see how they were all working at once... and if you did all this in a PR we could provide thoughts and advice as you went.
Then after you covered the basics you'd move on to extension specific things like variables, etc... though I guess variables are also part of CSS now too, lol... but you get the point I hope.
Thanks for the help!
I'm exploring the possibilities. I hope I can put together a single compact solution and avoid duplication.
@yyyc514 I'm still thinking about what to do with stylus. If you need to add css properties, which solution is good?
Currently used:
border-width|border-top-width|border-top-style|border-top-right-radius|border-top-left-radius|border-top-color|border-top|border-style|border-spacing|border-right-width|border-right-style|border-right-color|border-right|border-radius|border-left-width|border-left-style|border-left-color|border-left|border-image-width|border-image-source|border-image-slice|border-image-repeat|border-image-outset|border-image|border-color|border-collapse|border-bottom-width|border-bottom-style|border-bottom-right-radius|border-bottom-left-radius|border-bottom-color|border-bottom|border
Shorter version:
border-(bottom|top)-(left|right)-radius|border-image-(outset|repeat|slice|source|width)|border-(bottom|left|right|top)-(color|style|width)|border-(bottom|collapse|color|image|left|radius|right|spacing|style|top|width)|border
Most compact version:
border-((bottom|top)-(left|right)-radius|image-(outset|repeat|slice|source|width)|(bottom|left|right|top)-(color|style|width)|(bottom|color|collapse|image|left|radius|right|spacing|style|top|width))|border
The file size can thus be reduced. Because there are many such css properties.
I don't know how much it affects the speed, but it worsens the purity with the introduction of a new css feature. It will be slightly more labor intensive to add a new feature in some cases.
List them all out, the other just makes future maintenance hard. This is what gzip is for (saving bytes)... But I'd do even further and list then in array form now a string... I have a feeling we'll want to use them outside of keywords
...
I don't mind being smart for certain individual attribute though, such as for border:
border-(left|right|top|bottom)-(style|radius|width)
etc...
That might actually make it easier to see what is going on and doesn't make maintenance harder since it's really part of a single attribute "border", so it's well grouped.
I also saw a block solution, but there it finally generates a regular expression full of treasures out of the array with extra steps. In your opinion, which solution should be used?
If I was doing it I'd probably do a mix in an array:
PROPERTIES = [
[string],
[string],
[regex],
...
]
Then you could use regex to describe some of the attributes that lended themselves well to that... or else just an array of literal strings.
Though I still debate if you need an actual list. It seems you could just match something:
as a property, no? CSS does just fine now without having any lists at all.
So let's use it like this: PROPERTIES.join('|')
It's not good this way:
border-(left|right|top|bottom)-(style|radius|width)
In this case, non-existent properties are also formed.
In this case, non-existent properties are also formed.
Properties that will never appear in CSS files anyways. And how much easier to maintain is that that writing out 100 different possibilities? Many of our grammars cheat in the same way.
But again, why not avoid the list completely?
Yes css is good. The problem is the stylus. The list is important there. The language is so simplistic that the point of simplification is that you know the list. With a stylus, you will surely need a list.
The problem is the stylus.
How so? All the examples I see still have properties ending in :
Oh I see you can also write:
body
color white
Ugh :-) Still isn't the first word always a property?
The stylus leaves everything out, so incorrectly marked elements can be formed if there is no list. Unfortunately for Stylus, the list of html members also seems to be necessary :(.
Unfortunately for Stylus, the list of html members also seems to be necessary
I'm not sure this is true, but it MIGHT be. It's definitely easier if you have a list.
Isn't [single word] (, [single word])*
always a list of tags?
[spacing][single word] [another word]
is a property assignment.
Seems 100% possible:
https://github.com/PrismJS/prism/blob/master/components/prism-stylus.js
If we can avoid a list that also means the syntax is always stays up-to-date... and we don't need to keep adding new CSS/HTML tags to it over the years. That's a big win.
If possible, I'd like to avoid the list. But I wouldn't get a wrong selection either. The most interesting are the nested elements where you have to decide whether a new selector or a property is the word.
nav
margin 10px
ul
list-style-type none
> li
display inline-block
&.current
background-color lightblue
or
xy = red
nav
margin 10px
ul
list-style-type none
> li
display inline-block
background xy
color green
span
padding 10px
&.current
background-color lightblue
Thanks for the link. I'm investigating the solution.
In that sample every "more than one word" is an property/value pair. The only "edge case" is > li
and that's pretty easy to detect as a selector... (since it includes >
)
By word I mean "contiguous sequence of characters".
The prism: As I see it examines the lines above each other. If the indentation of the top row is less than that of the bottom row, it marks the top row as a selector while the property below it becomes.
But you can't do anything with variables. In the declaration, it could still be highlighted for the equal sign. However, the property values and the variables used there are indistinguishable.
This could be solved if hljs was able to retrieve the variable names before the equality sign and store it. Then the variable names could be searched from the properties. Is this possible?
In the prism, blank lines cause problems. The compiler does not deal with them, but in highlighting the prism will spoil the highlighting if there is an empty line between a selector and a property.
If none of the items are retracted, there is no highlight. If we write a space character under the fifth element then all five elements will be a selector.
element1
element2
element3
element4
element5
If we write space before the third element but do not write anything behind, then the first and second selectors will be. The third and the rest remain un highlighted. If you put a space after the third element, followed by any characters, except the comma, the third element will become a property element.
element1
element2
element3
element4
element5
However, when at least two spaces are placed in front of the fourth element, the third element returns to the selector.
element1
element2
element3 x
element4
element5
If there is a space before the third and fourth elements, and at least two spaces are placed in front of the fifth element, then the third and fourth elements will become selectors.
element1
element2
element3
element4
element5
Adding a space after the third element and then writing anything outside the comma becomes the third element property. After the fourth element, we write anything, including more spaces or anything else except parentheses. Everything will be highlighted as one element. A long selector element is obtained. The use of parentheses in the fourth element will eliminate the highlighting. Using the parenthesis as a function highlights the word in front of it.
element1
element2
element3 x
element4
element5
The selectors above each other are not highlighted by the prism as a common element.
<span class="token selector">element1
element2</span>
However, the property values and the variables used there are indistinguishable.
I'm not sure that's so terrible.
This could be solved if hljs was able to retrieve the variable names before the equality sign and store it. Then the variable names could be searched from the properties. Is this possible?
Not yet.
I would appreciate it if the variables were distinguishable. Now, I don't know yet that the new feature would be useful elsewhere, but maybe it wouldn't be useless to use a regular expression to read a list into an array that could be used in other regular expressions.
It would also be good to eliminate the problem of blank lines.
Unfortunately, I do not know the full depth of the stylus. I haven't used it for serious work so far. Hope there are no more problems. I highlight variables in sass, scss, less. If possible, I will highlight variables in the stylus later.
Oh I forgot the big use for keywords, auto-detect... so n/m... I guess I don't mind if we have lists of tags and properties since that'll make it easier to auto-detect.
Though good luck telling them apart. ;-)
I seem to miss the features that are included in the prism. The solution that exists to analyze the stylus in the way prism does may not be feasible :(.
I'm not giving up yet. Perhaps there will be a way to work, but you may need some mixed solutions.
I already said lets do the keywords, so we can auto-detect CSS-likes super well. :-)
The prism grammars are often super-simple (if they don't require/link to other prism grammars), but I didn't look at it that closely. Was that one particular part you were struggling with I could take a look? Does it matter if we can use a keyword list?
The biggest problem is that you can't search in begin. You can search only between begin and end. But using keywords will be no problem. This is probably why keywords were used in the previous solution. The logic of prismjs is different from that of hljs. True, the prism stylus is not perfect in its current form.
The biggest problem is that you can't search in begin.
You can is you use a look-ahead regex or returnToBegin
.
The logic of prismjs is different from that of hljs.
Yes, but much of what it does (at a glance) is transferrable, just we don't have ways to express things as simply as they do. If you have a specific example of rule they have that you don't think is doable in Highlight.js please share it and I'll take a look.
But again, I'm fine with keywords now that I consider the value they provide for auto-detect, and the fact that we already have them anyways... we just need to make sure we have one list that the syntaxes can share.
{
// property lines
begin: /((?:^|\{)([ \t]*))(?:[\w-]|\{[^}\r\n]+\})+(?:\s*:\s*|[ \t]+)[^{\r\n]*(?:;|[^{\r\n,](?=$)(?!(?:\r?\n|\r)(?:\{|\2[ \t]+)))/m,
contains: [
// properties, values, etc.
],
},
{
//selector lines
begin: /(^[ \t]*)(?:(?=\S)(?:[^{}\r\n:()]|::?[\w-]+(?:\([^)\r\n]*\))?|\{[^}\r\n]+\})+)(?:(?:\r?\n|\r)(?:\1(?:(?=\S)(?:[^{}\r\n:()]|::?[\w-]+(?:\([^)\r\n]*\))?|\{[^}\r\n]+\})+)))*(?:,$|\{|(?=(?:\r?\n|\r)(?:\{|\1[ \t]+)))/m,
contains: [
// selector tag, selector id, selector class, etc.
],
},
If there is no contains, everything is highlighted well. I tried to assign a class to them. So regular expressions seemed good. selected the selector lines and selected the lines containing the properties separately. However, when I wanted to keep searching for contains, then I was no longer successful.
Something went wrong with regular express searches.
You'd use look ahead to do this... so (?=actual regex here)
. So then your begin would match, but NOT capture anything (and not advance the cursor)... then in your contains you'd have the rules to match the subexpressions.
I like the regular expressions mentioned above because without knowing the content it can be used to determine that a selector or expression (property and value) is a given line. The only problem is that I couldn't find any further subunits in the contains.
header
margin 0
padding 0
background black
nav
color green
background gray
border-bottom 2px solid orange
&.cls
ul
li
span
color white
background blue
ul
margin 2px
padding 2px
list-style none
li
color white
&.selected
color red
&.inactive
color brown
span
color blue
background orange
#id
border 1px dotted gray
&:hover
color pink
strong
span
border-style dashed
if:
{
className: 'attribute',
begin: /((?:^|\{)([ \t]*))(?:[\w-]|\{[^}\r\n]+\})+(?:\s*:\s*|[ \t]+)[^{\r\n]*(?:;|[^{\r\n,](?=$)(?!(?:\r?\n|\r)(?:\{|\2[ \t]+)))/m,
},
{
className: 'selector-tag',
begin: /(^[ \t]*)(?:(?=\S)(?:[^{}\r\n:()]|::?[\w-]+(?:\([^)\r\n]*\))?|\{[^}\r\n]+\})+)(?:(?:\r?\n|\r)(?:\1(?:(?=\S)(?:[^{}\r\n:()]|::?[\w-]+(?:\([^)\r\n]*\))?|\{[^}\r\n]+\})+)))*(?:,$|\{|(?=(?:\r?\n|\r)(?:\{|\1[ \t]+)))/m,
},
Regular expressions can only distinguish between rows. After separating the rows, the contents of the rows should be analyzed, but so far it has failed properly :(.
<span class="hljs-selector-tag">header</span>
<span class="hljs-attribute"> margin 0</span>
<span class="hljs-attribute"> padding 0</span>
<span class="hljs-attribute"> background black</span>
<span class="hljs-selector-tag"> nav</span>
<span class="hljs-attribute"> color green</span>
<span class="hljs-attribute"> background gray</span>
<span class="hljs-attribute"> border-bottom 2px solid orange</span>
<span class="hljs-selector-tag"> &.cls</span>
<span class="hljs-selector-tag"> ul</span>
<span class="hljs-selector-tag"> li</span>
<span class="hljs-selector-tag"> span</span>
<span class="hljs-attribute"> color white</span>
<span class="hljs-attribute"> background blue</span>
<span class="hljs-selector-tag"> ul</span>
<span class="hljs-attribute"> margin 2px</span>
<span class="hljs-attribute"> padding 2px</span>
<span class="hljs-attribute"> list-style none</span>
<span class="hljs-selector-tag"> li</span>
<span class="hljs-attribute"> color white</span>
<span class="hljs-selector-tag"> &.selected</span>
<span class="hljs-attribute"> color red</span>
<span class="hljs-selector-tag"> &.inactive</span>
<span class="hljs-attribute"> color brown</span>
<span class="hljs-selector-tag"> span</span>
<span class="hljs-attribute"> color blue</span>
<span class="hljs-attribute"> background orange</span>
<span class="hljs-selector-tag"> #id</span>
<span class="hljs-attribute"> border 1px dotted gray</span>
<span class="hljs-selector-tag"> &:hover</span>
<span class="hljs-attribute"> color pink</span>
<span class="hljs-selector-tag"> strong</span>
<span class="hljs-selector-tag"> span</span>
<span class="hljs-attribute"> border-style dashed</span>
I assigned the two classes to regular expressions just for the sake of demonstration.
Interestingly, when I start a sub language, internal search works flawlessly. This will cause the sub-language markup system to function correctly within the lines of that regular expression.
As I said you need to use lookahead expressions.
https://www.regular-expressions.info/lookaround.html
So for an attribute you match:
(?=[expression for attribute)
That will result in a 0 length match... your contains block will still be starting at the beginning of the match. Trivial example:
margin 2px
start: /(?=margin \d(px))/
contains: [
{ start: "margin", className: "attribute"},
{ start: /\d(px)/, className: "numeric"}
]
These are just line selectors. This allows you to separate line items with different contents. Otherwise, two different line types could not be separated.
start: /(?=margin \d(px))/ contains: [ { start: "margin", className: "attribute"}, { start: /\d(px)/, className: "numeric"} ]
Result:
If I understand it, do you suggest the old solution? Without separating the rows, it is also necessary to include the html members. At least with the stylus.
I do not understand what you are asking. I was merely trying to answer:
The only problem is that I couldn't find any further subunits in the contains.
Oh, perhaps now I get it. Any mode will end immediately when it fails to find additional matches. So you'd need "dummy" rules here to eat up the extra whitespace between matches... if you were trying to match groups...
That's why I don't think you honestly want to try to keep track of "parents" and "children"... just process each line one at a time, figure out what type of line it is, and highlight it appropriately.
So only attribute
would have a contains...
And actually to do a "sequence" of things you might need to chain rules with starts
rather than using contains
at all... we don't have a syntactically easy way to do chains currently.
(?=)
Very simple example: https://jsfiddle.net/w3suli/ygj3t5ed/10/
What should I do here (// attributes, values, and variables ...) to analyze the content? What should I do here (// selectors and variables ...) to analyze my content? So far, none of my ideas have worked properly.
Just off the top of my head:
A list of things to keep in sync:
See the
css_consistency
branch.:
https://github.com/highlightjs/highlight.js/pull/2425::
https://github.com/highlightjs/highlight.js/pull/2425:lang(de)
a[href*="example" I]
(See: #2243)(grayscale(0.5)
)@import
, etc.!important
Sass bug list:
Less bug list:
Stylus bug list:
There is a lot of difference. If I have enough time, I'll make examples one by one. I welcome any help. Thanks in advance for all your help!