joshgoebel / se_highlightjs

Improved Highlight.js support for StackExchange / StackOverflow
MIT License
7 stars 1 forks source link

Smarter auto-hinting #3

Open joshgoebel opened 3 years ago

joshgoebel commented 3 years ago

Is your feature request related to a problem? Please describe.

When two tags both have an auto-hint SE ignores the auto-hints and just falls back to dumb auto-detection... it's auto-hinting can't handle more than a single possible language.

Describe the solution you'd like

Instead both auto-hints should be used to "guide" the auto-detection. Discussion If a post is tagged javascript and typescript (yes I know SE already fixed this single case, but it's illustrative) then instead of just saying "it could be anything" and landing on Go or Dart... it should be taken into consideration that it's QUITE likely to vs JS or TS... and those should be artificially boosted...

So if it looks like a tight race between Go, Dart, JS, and TS... then JS or TS should most often win for posts tagged as JS or TS.

Describe alternatives you've considered

None.

How someone could help

We need an automatic way (screen scrape? Console script?) to quickly pull down all the tag -> auto-hint mappings so we can have then in our codebase...

joshgoebel commented 3 years ago

Note there will ultimately be two lists here... the official SE list... and then we'll need to augment it for all the additional languages we support that they can't auto-hint because they don't load all the grammars, like for example mapping powershell to the Powershell grammar, etc.

All we need to get this rolling is the official list. The unofficial list can be done in a more one-off fashion unless someone wants to set down and map the whole thing out.

yellis commented 3 years ago

@joshgoebel commenting here so that you can ping me when you reach a good place with the SE list. We are reading all of your posts, and want to improve things, but don't have time on the roadmap to do it in the immediate future. However, if you can add logic to help with our guessing, it will make it more likely that we will be able to get this logic in sooner. Please ping me when relevant (or email me at yaakov@[guess-the-domain], or DM me on twitter). Thanks!

joshgoebel commented 3 years ago

commenting here so that you can ping me when you reach a good place with the SE list.

Well you already have the list I'm talking about here. :-) Just it's spread across a zillion tag reference pages as a tiny comment at the bottom. :-)

However, if you can add logic to help with our guessing

Well, I'm going to see how it plays out in this extension first and then with that experience look at what hooks the main library may needed... none of this is super hard to do now, but it may get a bit easier if we abstract the idea a bit more... I think your backend may need to change slightly, because what we really need is a list of ALL tag auto-hints (to pass to the highlighter), where-as right now you seem to only have a single js-codeblock-lang... And when there are two auto-hints they cancel each other out - which is not good.

There are probably also legitimate auto-detect defects that we can fix with time (I already have a PR that fixes many). But honestly if SE added multi-tag auto-hinting I think that would go a long way towards fixing a lot of the auto-detect issues that people are seeing with auto-detect looking stupid. You're probably very lucky you have the auto-hinting at all. :-)

FYI: I'm calling the language associated with a tag an "auto-hint" vs ```grooovy being a manual hint. I think you call the latter an override.

joshgoebel commented 3 years ago

@yellis Is there some easy way (API, etc) to get a list of ALL tags and their associated auto-hint language, ie the thing that's found at the bottom of each tag info page:

Code Language (used for syntax highlighting): lang-py

The extension has to handle that mapping for now unless those are somehow ALL dumped into the HTML that I just don't know about (other than the singular js-codeblock-lang which isn't good enough for our purposes).