highlightjs / highlight.js

JavaScript syntax highlighter with language auto-detection and zero dependencies.
https://highlightjs.org/
BSD 3-Clause "New" or "Revised" License
23.65k stars 3.59k forks source link

Discuss: 2021: Inclusion of new languages in `:common` set #2848

Closed joshgoebel closed 3 years ago

joshgoebel commented 3 years ago

Current decision:


Wanted to open a new discussion for v11 (or perhaps even just 2021) regarding whether any new languages should be added to our "common" set... and hence also the default CDN distributable highlight.min.js. We can of course add at anytime (since adding is not a breaking change) but I think it's helpful to have a yearly review.

Removing grammars however would be a breaking change so any removals will by necessity wait until v11.

There is no real criteria other than the nebulous and vague "common"... which I've always taken as a loose synonym for "popular", "frequently used", "people have heard of it", etc...

The 2019 discussion on this: https://github.com/highlightjs/highlight.js/issues/2206

What we considered adding last time but didn't:

What almost got cut last time:

Things we should probably add for parity:


The current full :common list:

apache
bash
c
coffeescript
cpp
csharp
css
diff
go
http
ini
java
javascript
json
kotlin
less
lua
makefile
markdown
nginx
objectivec
perl
php-template
php
properties
python-repl
python
ruby
rust
scss
shell
sql
swift
typescript
xml
yaml
Hirse commented 3 years ago

Looking at some language rankings (see below), it seems like good candidates for adding would be r, vbnet, and some form of assembly. Further possible candidates for adding would then be powershell, groovy, matlab, dart scala, haskell, and julia.

As far as removal goes, definitely coffeescript and possibly lua as the only languages that are in none of the rankings below.

Additionally, the list of config languages (apache, http, ini, nginx, properties) currently in common don't quite feel like they belong.


Tiobe Index

https://www.tiobe.com/tiobe-index Language Name HLJS Class
C c
Python python
Java java
C++ cpp
C# csharp
Visual Basic vbnet
JavaScript javascript
PHP php
R r
SQL sql
Groovy groovy
Perl perl
Go go
Swift swift
Ruby ruby
Assembly language x86asm (?)
MATLAB matlab
Delphi/Object Pascal delphi
Objective-C objectivec
Transact-SQL tsql

StackOverflow Survey 2020

https://insights.stackoverflow.com/survey/2020#technology-programming-scripting-and-markup-languages Language Name HLJS Class
JavaScript javascript
HTML xml
CSS css
SQL sql
Python python
Java java
Bash/Shell bash
PowerShell powershell
C# csharp
PHP php
TypeScript typescript
C++ cpp
C c
Go go
Kotlin kotlin
Ruby ruby
Assembly x86asm (?)
VBA vba (vbnet?)
Swift swift
R r
Rust rust
Objective-C objectivec
Dart dart
Scala scala
Perl perl
Haskell haskell
Julia julia

GitHub Octoverse 2019

https://octoverse.github.com/#top-languages-over-time Language Name HLJS Class
JavaScript javascript
Python python
Java java
PHP php
C# csharp
C++ cpp
TypeScript typescript
Shell bash
C c
Ruby ruby
joshgoebel commented 3 years ago

Thanks for the thoughts!

As far as removal goes, definitely coffeescript and possibly lua as the only languages that are in none of the rankings below.

Lua was specifically added for "fun" not popularity, but perhaps that was a wrong idea. :-) A little bit of my opinion sneaking into the list. :-)

Additionally, the list of config languages (apache, http, ini, nginx, properties) currently in common don't quite feel like they belong.

Agree, they are definitely more systemy... but we've kind of always had them and they aren't large... so I feel like we'd need a good reason (or lot of agreement to rip them out). I might disagree on ini though... or does no one use ini anymore? I grew up with ini files (though I don't use them myself anymore). :-) Feel like JSON and YAML have killed INI for many spaces.

I also wonder if perhaps we should have a size target in advance for "how large can our default set be" vs just sticking things in and then saying "feels right" at some point. :-) Currently 37kb gzipped still feels pretty tiny to me.

joshgoebel commented 3 years ago
node ./tools/build.js -t browser :common r vbnet powershell groovy matlab dart scala haskell julia
highlight.js        : 333694 bytes
highlight.min.js    : 136067 bytes
highlight.min.js.gz : 44664 bytes

Only adds (without any removals)... still not bad on the gziped size. I'd also be game if we made the future downloader more flexible as in able to check off categories... so perhaps you just requested:

And say that gave you common + web (http, apache, etc) + functional (functional langauges), etc... this is harder though if we have more and more combinations because then we have to build and put them all on the CDN, etc... more moving pieces, more potential of breakage, etc.

joshgoebel commented 3 years ago

Additionally, the list of config languages (apache, http, ini, nginx, properties) currently in common don't quite feel like they belong.

@egor-rogov @allejo Any objections to dropping any of these from :common with v11?

Also my vote would be:

x86asm is one of the larger syntaxes, so I'm not a fan of adding it to the default bundle.

egor-rogov commented 3 years ago

No particular objections (for any of those suggestions)

Hirse commented 3 years ago

@joshgoebel Reading through the comments again, I do agree that ini should probably stay as it is used for config by Python, Rust, and Git.

Otherwise, the plan sounds good. 👍

allejo commented 3 years ago
  • drop coffeescript (fading popularity)
  • add R and VB.net as they seem popular/common
  • I have no opposition to dropping any of (apache, http, ini, nginx, properties) if we're in agreement

x86asm is one of the larger syntaxes, so I'm not a fan of adding it to the default bundle.

-1 on removing ini. As Hirse said, it's common for a number of configuration files +1 on adding R +0.5 on adding VB.net; yea unfortunately it's popular enough +1 on removing the rest here

joshgoebel commented 3 years ago

So then:

No objection to keeping ini, so it'll stay.

joshgoebel commented 3 years ago

Closing per above conclusions.