highlightjs / highlight.js

JavaScript syntax highlighter with language auto-detection and zero dependencies.
https://highlightjs.org/
BSD 3-Clause "New" or "Revised" License
23.69k stars 3.6k forks source link

Reconsider :common set #1119

Closed isagalaev closed 5 years ago

isagalaev commented 8 years ago

Cc: @Sannis, @sourrust

Hi guys,

I've had a few ideas about updating the :common set of languages to better match reality and expectations.

Here's the download stats from the site, by language:

 824 xml
 817 javascript
 776 json
 773 css
 690 sql
 680 php
 673 http
 657 markdown
 642 java
 638 bash
 611 python
 601 cpp
 592 apache
 587 diff
 587 cs
 585 ini
 582 ruby
 572 makefile
 571 coffeescript
 567 nginx
 561 objectivec
 556 perl
 551 dts
 131 yaml
 131 go
 100 swift
  96 less
  92 scss
  85 lua
< 80 — not interesting

The 500+ top are the current :common set (with "dts" being there by mistake, we've missed it during review).

I've got two alternative ideas:

What do you think?

Sannis commented 8 years ago

I'm on second varian. But it looks we need to increase major version to change this set?

derhuerst commented 8 years ago

Please introduce a new major version and create standalone definitions for almost all languages. :package: :tada:

isagalaev commented 8 years ago

Versions changing is cheap, it's a technicality :-)

@derhuerst we already have all the languages available separately on CDNs at languages/<name>.min.js

derhuerst commented 8 years ago

@derhuerst we already have all the languages available separately on CDNs at languages/.min.js

Yes, but to have them as standalone CommonJS-compatible NPM modules. (;

isagalaev commented 8 years ago

@derhuerst the NPM build already includes all the languages, there's no point in packaging them individually.

derhuerst commented 8 years ago

@derhuerst the NPM build already includes all the languages, there's no point in packaging them individually.

There is, for example in reducing the bundle size when using browserify, which seems to be the standard frontend workflow nowadays. It would also help keeping major version bump down to a minimum in the core.

Moving the core detection into a separate module would also be nice, as many non-browser tools could directly use it (see #1086).

isagalaev commented 8 years ago

Let's keep the discussion in this issue related to the topic. We're not discussing packaging in general, we're discussing the :common set for browsers.

As for packaging, we're currently holding to a policy where we don't cater to any currently trending way of bundling but rather provide a build tool which can be used to package highlight.js in any way imaginable. We may reconsider this in the future, though.

tajmone commented 8 years ago

@isagalaev I think your proposal is good: it's not based on personal preferences, instead it takes into account what users look for — therefore "common" refers to what is commonly expected.

But I would like to expand on the issue of keywords filtering:

  1. I haven't understood from documentation how keyword filtering works: if I add 2 or more keywords (eg: :common :confing) do these keywords filter out each other (ie: reduce selection only to those langs appearing in both) or do they broaden the selection (ie: select language belonging to both?)
  2. I'd say the keyword filtering should allow specifications like + and -, so users could, for example, filter :common :confing- :markup+ to have all common langs, minus config ones, plus all markup ones — how exactly to implement it is something to be though about.
  3. I think more categories would be helpful to handle better filtering without having to go through long command lines.
isagalaev commented 8 years ago

if I add 2 or more keywords (eg: :common :confing) do these keywords filter out each other (ie: reduce selection only to those langs appearing in both) or do they broaden the selection (ie: select language belonging to both?)

The latter.

I'd say the keyword filtering should allow specifications like + and -

I'm not aware of anyone actually needing it. There are essentially just three ways the build tool is ever used:

joshgoebel commented 5 years ago

@isagalaev Could we get some fresh download stats to reconsider this issue in 2019? If you have time.

joshgoebel commented 5 years ago

I vote both. I don't see the harm in having "small" and "medium" builds (so 40kb and 200kb, let the user decide). I just have no idea where in the codebase I could go right now to change this, or if this is hidden away on some build server somewhere.

egor-rogov commented 5 years ago

Perhaps @marcoscaceres knows?

joshgoebel commented 5 years ago

I think we both decided only @isagalaev knows.

isagalaev commented 5 years ago

@yyyc514 @egor-rogov I get the stat by parsing logs on highlightjs.org, it's not anywhere in the code base. Current stats:

$ ./language_top.py 
 544 xml
 521 javascript
 509 css
 502 json
 459 sql
 456 http
 454 bash
 446 python
 444 java
 443 markdown
 441 php
 427 ruby
 420 shell
 419 cpp
 415 cs
 406 diff
 405 makefile
 404 nginx
 400 ini
 395 apache
 393 perl
 391 objectivec
 388 coffeescript
 385 properties
 371 yaml
  63 go
  61 scss
  46 kotlin
  44 r
  43 dockerfile
  42 powershell
  39 typescript
  39 plaintext
  39 lua
  39 less
  38 rust
  38 groovy
  37 gradle
  36 swift
  35 scala
  34 vim
  34 awk
  31 erlang
  29 dart
  28 arduino
  27 pgsql
  27 django
  27 basic
  26 erlang-repl
  26 cmake
  25 matlab
  25 fsharp
  25 asciidoc
  25 applescript
  24 haskell
  23 vbscript
  23 vbnet
  23 elm
  22 haml
  21 excel
  20 mathematica
  20 lisp
  20 ebnf
  20 dos
  20 armasm
  20 abnf
  19 protobuf
  19 gherkin
  19 elixir
  19 bnf
  18 tex
  18 julia
  18 dns
  18 clojure
  18 capnproto
  18 aspectj
  17 vbscript-html
  17 golo
  17 avrasm
  17 actionscript
  17 accesslog
  16 thrift
  16 fortran
  15 x86asm
  15 ruleslanguage
  15 htmlbars
  15 1c
  14 profile
  14 delphi
  14 autohotkey
  13 xquery
  13 twig
  13 scheme
  13 glsl
  13 dts
  13 d
  13 ada
  12 jboss-cli
  12 handlebars
  12 erb
  12 brainfuck
  12 autoit
  11 vala
  11 smalltalk
  11 smali
  11 purebasic
  11 ocaml
  11 mipsasm
  11 livecodeserver
  11 dust
  11 dsconfig
  11 csp
  11 cal
  11 angelscript
  10 zephir
  10 xl
  10 vhdl
  10 verilog
  10 stylus
  10 sml
  10 sas
  10 puppet
  10 prolog
  10 pf
  10 oxygene
  10 openscad
  10 nix
  10 llvm
  10 leaf
  10 isbl
  10 hsp
  10 haxe
  10 gml
  10 crystal
  10 crmsh
   9 stan
   9 sqf
   9 roboconf
   9 reasonml
   9 processing
   9 pony
   9 monkey
   9 ldif
   9 julia-repl
   9 fix
   9 cos
   9 coq
   9 clojure-repl
   9 clean
   9 ceylon
   9 axapta
   9 arcade
   8 tp
   8 tcl
   8 tap
   8 subunit
   8 step21
   8 stata
   8 scilab
   8 rsl
   8 routeros
   8 rib
   8 qml
   8 q
   8 parser3
   8 nsis
   8 nimrod
   8 n1ql
   8 moonscript
   8 mojolicious
   8 mizar
   8 mercury
   8 mel
   8 maxima
   8 lsl
   8 livescript
   8 lasso
   8 irpf90
   8 inform7
   8 hy
   8 gcode
   8 gauss
   8 gams
   8 flix
   6 taggerscript

This is, however, not a very good source of truth anyways for two reasons: it's biased (heavily) towards currently pre-selected :common, and more importantly, it doesn't include usage statistics from linking to CDNs. The only thing I can personally recommend looking at are the languages right below the heavy top: Go, SCSS, Kotlin, R… Those are the ones people apparently bother to manually select in not insignificant numbers.

However, this entire idea may not be worth tackling by itself in light of #1759.

P.S. I've been actually very much removed from highlight.js for quite some time now. I only noticed this discussion by pure accident among another 100+ new emails I suddenly got in my inbox :-)

joshgoebel commented 5 years ago

currently pre-selected :common

@isagalaev The question is where is this :common list and how do we update it? IE, the list you've always used to build this "canonical 40kb file"... We're not going to come up with a new build system overnight but it would be nice to update :common when we issue new releases until then.

Or perhaps even to split common into light, medium, heavy, like coffee roasts, etc... who knows... I was hoping for such things to be in the codebase, but they appear to be hidden on your build server perhaps?

isagalaev commented 5 years ago

The question is where is this :common list and how do we update it?

Ah… This comes from metadata in language files, specifically Category: key, like here: https://github.com/highlightjs/highlight.js/blob/master/src/languages/javascript.js#L4. These categories are used in the menu on the demo page, but the one named "common" has this special meaning of being pre-selected on the download page and also being included in the CDN build. This can indeed be updated entirely through source changes.

Making more special categories would indeed require changes to the server. But I always felt it wasn't really a solution anyway.

joshgoebel commented 5 years ago

Well, that makes it harder for us to have TWO different builds I suppose but it's very helpful to know we can change it. :-) Thanks!

joshgoebel commented 5 years ago

@egor-rogov After reviewing this my votes for adding to common:

-rw-r--r--  1 jgoebel  staff   732 Oct 14 18:02 src/languages/dockerfile.js
-rw-r--r--  1 jgoebel  staff  1692 Oct 14 18:00 src/languages/go.js
-rw-r--r--  1 jgoebel  staff  6414 Oct 14 18:02 src/languages/kotlin.js
-rw-r--r--  1 jgoebel  staff  5022 Oct 14 18:01 src/languages/less.js
-rw-r--r--  1 jgoebel  staff  2965 Oct 14 18:01 src/languages/lua.js
-rw-r--r--  1 jgoebel  staff   209 Oct 14 17:56 src/languages/plaintext.js
-rw-r--r--  1 jgoebel  staff  1873 Oct 14 18:02 src/languages/r.js
-rw-r--r--  1 jgoebel  staff  3523 Oct 14 18:03 src/languages/rust.js
-rw-r--r--  1 jgoebel  staff  7337 Oct 14 18:01 src/languages/scss.js
-rw-r--r--  1 jgoebel  staff  5313 Oct 14 18:00 src/languages/swift.js
-rw-r--r--  1 jgoebel  staff  5673 Oct 14 18:03 src/languages/typescript.js

All sizes uncompressed... All seem pretty tight and compact...

+41kb raw +16kb (gziped)

That's just off the top of my head.

I don't really know about R, but it's pretty small... Most of the other stuff is pretty known to be hot right now and kind of popular. Swift, Rust, Go, Kotlin, SCSS, Less, Docker, Typescript, etc...

The only thing that popped for possible demotion is CoffeeScript (4.1kb uncompressed), which has seen better days...

joshgoebel commented 5 years ago

Oh there is PowerShell, but it's pretty heavy at 35kb, making me dislike compared to the others.

If size was no issue (or less of an issue)... ie for a "medium" build I'd just go and start picking everything I've heard of:

And that'd probably still be pretty small.

joshgoebel commented 5 years ago
 % grep -R "Category:.*common" src/ | cut -d ':' -f1 | xargs ls -l                                                                                                2.6.5
-rw-r--r--  1 jgoebel  staff   1550 Oct 14 17:56 src//languages/apache.js
-rw-r--r--  1 jgoebel  staff   2570 Oct 14 17:56 src//languages/bash.js
-rw-r--r--  1 jgoebel  staff   4160 Oct 14 17:56 src//languages/coffeescript.js
-rw-r--r--  1 jgoebel  staff   6896 Oct 14 17:56 src//languages/cpp.js
-rw-r--r--  1 jgoebel  staff   5688 Oct 14 17:56 src//languages/cs.js
-rw-r--r--  1 jgoebel  staff   2881 Oct 14 17:56 src//languages/css.js
-rw-r--r--  1 jgoebel  staff   1045 Oct 14 17:56 src//languages/diff.js
-rw-r--r--  1 jgoebel  staff    732 Oct 14 18:02 src//languages/dockerfile.js
-rw-r--r--  1 jgoebel  staff   1692 Oct 14 18:00 src//languages/go.js
-rw-r--r--  1 jgoebel  staff   1179 Oct 14 17:56 src//languages/http.js
-rw-r--r--  1 jgoebel  staff   1802 Oct 14 17:56 src//languages/ini.js
-rw-r--r--  1 jgoebel  staff   3431 Oct 14 17:56 src//languages/java.js
-rw-r--r--  1 jgoebel  staff   5923 Oct 14 17:56 src//languages/javascript.js
-rw-r--r--  1 jgoebel  staff   1327 Oct 14 17:56 src//languages/json.js
-rw-r--r--  1 jgoebel  staff   6414 Oct 14 18:02 src//languages/kotlin.js
-rw-r--r--  1 jgoebel  staff   5022 Oct 14 18:01 src//languages/less.js
-rw-r--r--  1 jgoebel  staff   2965 Oct 14 18:01 src//languages/lua.js
-rw-r--r--  1 jgoebel  staff   2156 Oct 14 17:56 src//languages/makefile.js
-rw-r--r--  1 jgoebel  staff   2530 Oct 14 17:56 src//languages/markdown.js
-rw-r--r--  1 jgoebel  staff   2480 Oct 14 17:56 src//languages/nginx.js
-rw-r--r--  1 jgoebel  staff   3513 Oct 14 17:56 src//languages/objectivec.js
-rw-r--r--  1 jgoebel  staff   5050 Oct 14 17:56 src//languages/perl.js
-rw-r--r--  1 jgoebel  staff   3724 Oct 14 17:56 src//languages/php.js
-rw-r--r--  1 jgoebel  staff    209 Oct 14 17:56 src//languages/plaintext.js
-rw-r--r--  1 jgoebel  staff   1850 Sep 24 00:04 src//languages/properties.js
-rw-r--r--  1 jgoebel  staff   3159 Oct 14 17:56 src//languages/python.js
-rw-r--r--  1 jgoebel  staff   1873 Oct 14 18:02 src//languages/r.js
-rw-r--r--  1 jgoebel  staff   5314 Oct 14 17:56 src//languages/ruby.js
-rw-r--r--  1 jgoebel  staff   3523 Oct 14 18:03 src//languages/rust.js
-rw-r--r--  1 jgoebel  staff   7337 Oct 14 18:01 src//languages/scss.js
-rw-r--r--  1 jgoebel  staff    363 Sep 24 00:04 src//languages/shell.js
-rw-r--r--  1 jgoebel  staff  14995 Oct 14 17:56 src//languages/sql.js
-rw-r--r--  1 jgoebel  staff   5313 Oct 14 18:00 src//languages/swift.js
-rw-r--r--  1 jgoebel  staff   5673 Oct 14 18:03 src//languages/typescript.js
-rw-r--r--  1 jgoebel  staff   3053 Oct 14 17:56 src//languages/xml.js
-rw-r--r--  1 jgoebel  staff   2664 Oct 14 17:56 src//languages/yaml.js

Apache and Nginx seems a little obscure perhaps (as languages), but the sizes are pretty small.

joshgoebel commented 5 years ago

@egor-rogov Any issue on renaming a few? I was going to rename ini to toml (the superset) but then I worried about breaking links... but people link to a VERSION on a cdn... so if they want to upgrade they have to bump the version # anyways so maybe that's an opportunity for them to read the release notes and see we renamed something?

Not super important, just a thought.

egor-rogov commented 5 years ago

@yyyc514 I wouldn't break compatibility with no good reason.

joshgoebel commented 5 years ago
-rw-r--r--  1 jgoebel  staff  130977 Oct 14 19:05 highlight.medium.pack.js
-rw-r--r--  1 jgoebel  staff   71161 Oct 14 19:05 highlight.pack.js

And by modern standards both those look tiny (just packed, not even gzipped)

joshgoebel commented 5 years ago

Closing this in favor fo the new issue: https://github.com/highlightjs/highlight.js/issues/2206