Open ghost opened 3 years ago
Here's an idea @TylerLeonhardt - what if we simply respect .gitattributes as well as workspace settings?
So I'm still trying to understand this scenario... can you give an example where 2 extensions on the marketplace have conflicting file extension associations?
I feel like when we detect there's a conflict, we should do something (and in fact we might already ask the user to choose) but I'm not sure.
I know we have this:
That shows up when you open the language picker on a non-untitled file.
Here's a list, sorting languages by probability:
.al
for AL Perl ActionScript.d
for DLang Makefile.fs
for F# FirstSpirit.gml
for GameMaker XML.gs
for JavaScript GLSL Genie.h
for C or C++/ObjC.inc
for PHP M68K C Pascal.lp
CommonLisp Newlisp.m
for ObjC MATLAB Mercury.pm
for Perl Raku.properties
for INI JavaProperties.r
for Rlang Rebol.re
for Reason C++.rs
for Rust XMl.sql
for any SQL variant.t
for Perl Raku.ts
/.tsx
for TypeScript XML.vba
for VBA Vim.yy
for Yacc JSONSetting up local config isn't possible in situations where the repo is for a different IDE or the branch PR is focused on other work
I get asked to open a new PR and those pulls end up lowest priority for most people
I think weird meta repositories provide a great test case for this problem. Taking a look at:
https://github.com/github/linguist/search?q=extension%3Are&type=Code
The .re
extension can mean two different languages and Github's able to tell the difference with a heuristic regex
Here's an idea @TylerLeonhardt - what if we simply respect .gitattributes as well as workspace settings?
This is what I came here to suggest! Right now I have to duplicate linguist-language
in .gitattributes
and files.associations
in .vscode/settings.json
As described in https://github.com/microsoft/vscode/issues/129004#issuecomment-882751887, I don't feel the current behaviour file association is effective in VSC for full-stack or versatile programmers.
There is prior art in the form of
neel1996/langline
although the heuristics implementation didn't really inspire @TylerLeonhardt.That package uses GitHub Linguist which is highly tested but relies on hardcoded regexes.. as seen in https://github.com/github/linguist/blob/617fa486aad61043996e1323a429c900493c89a7/lib/linguist/heuristics.yml#L30 and other files.
The Tensorflow ML approach would be much more versatile than the Github Linguist NPM module, and other staff members mentioned using that for disambiguation. However, Linguist IDs would need to be matched to their corresponding
languageId
. The ML would also need to be trained to recognise contributed language(s).Also noting: