sheerun / vim-polyglot

A solid language pack for Vim.
5.6k stars 297 forks source link

Improve filetype detection heuristics (e.g. objectivec vs mathematica) #498

Closed chipsenkbeil closed 4 years ago

chipsenkbeil commented 4 years ago

Does this bug happen when you install plugin without vim-polyglot?

No.

Describe the bug:

Setting g:filetype_m = 'objc' gets overridden by vim-polyglot on line 391 of ftdetect/polyglot.vim.

Likewise, setting au BufRead,BufNewFile *.m set filetype=objc within ftdetect/objc.vim or equivalents in $VIMRUNTIME/filetype.vim get overridden.

To Reproduce:

Put the configurations described above. Load without vim-polyglot and observe the filetype is set to objc. With vim-polyglot, observe the filetype is set to mma.

jrwrigh commented 4 years ago

Additionally, this overrides the default behavior as well. When opening a blank *.m file, the syntax should default to matlab. Instead, it defaults to mma which is for mathematica. I don't think this is intended behavior, and even if it is, I don't think this should be overriding the defaults.

@chipsenkbeil This can be fixed by using the let g:polyglot_disabled = ['mathematica'] option before polyglot gets loaded, then you can use the g:filetype_m setting successfully.

chipsenkbeil commented 4 years ago

Additionally, this overrides the default behavior as well. When opening a blank *.m file, the syntax should default to matlab. Instead, it defaults to mma which is for mathematica. I don't think this is intended behavior, and even if it is, I don't think this should be overriding the defaults.

@chipsenkbeil This can be fixed by using the let g:polyglot_disabled = ['mathematica'] option before polyglot gets loaded, then you can use the g:filetype_m setting successfully.

Yep! That's what I ended up doing! Good to have it written out and confirmed from someone else :D

sheerun commented 4 years ago

The question is which is more popular

sheerun commented 4 years ago

I've looked what upstream vim does and they read 100 lines of file to distinguish between few languages....

func dist#ft#FTm()
  let n = 1
  let saw_comment = 0 " Whether we've seen a multiline comment leader.
  while n < 100
    let line = getline(n)
    if line =~ '^\s*/\*'
      " /* ... */ is a comment in Objective C and Murphi, so we can't conclude
      " it's either of them yet, but track this as a hint in case we don't see
      " anything more definitive.
      let saw_comment = 1
    endif
    if line =~ '^\s*\(#\s*\(include\|import\)\>\|@import\>\|//\)'
      setf objc
      return
    endif
    if line =~ '^\s*%'
      setf matlab
      return
    endif
    if line =~ '^\s*(\*'
      setf mma
      return
    endif
    if line =~ '^\c\s*\(\(type\|var\)\>\|--\)'
      setf murphi
      return
    endif
    let n = n + 1
  endwhile

  if saw_comment
    " We didn't see anything definitive, but this looks like either Objective C
    " or Murphi based on the comment leader. Assume the former as it is more
    " common.
    setf objc
  elseif exists("g:filetype_m")
    " Use user specified default filetype for .m
    exe "setf " . g:filetype_m
  else
    " Default is matlab
    setf matlab
  endif
endfunc
jrwrigh commented 4 years ago

The question is which is more popular

I'd say a better question is how to avoid overriding default vim behavior. Admittedly, I've only glanced at the code, but I don't see why the ftdetect has to "lock-in" what the definition of the filetype is instead of just letting vim decide like it normally does.

sheerun commented 4 years ago

It's because I wanted to avoid reading file contents as this operation is not very performant

chipsenkbeil commented 4 years ago

@sheerun, there are a couple of ways to approach this as I see it.

  1. There should be a disclaimer somewhere in the documentation that you need to disable mathematica in order to use *.m files with any other type. This would be the bare minimum, but would make it more clear. I spent awhile trying out the different choices mentioned in the original post only to realize that polyglot was the culprit after debugging.
  2. Polyglot should respect re-assigning the *.m binding through g:filetype_m. This would be a single check for its existence prior to binding.
  3. Polyglot should support disabling the *.m binding for mathematica rather than needing to disable all bindings such that it could be swapped.
  4. Perform the more expensive file parsing, perhaps through an opt-in option, whereas the default is to keep however you'd like polyglot to handle this.

Personally, I'm a fan of #2. Add a check for filetype_m prior to assigning the mma filetype to *.m files. It's short and clean. But, at the minimum, I'd hope this could be highlighted in documentation somewhere.

sheerun commented 4 years ago

g:filetype_m isn't anything standard (and you can't even get any google results for it), so I'm not big fan of it. I thought about something like let g:polyglot_preferred = ['objc'] which would prioritize objc syntax over any other conflicting filetype, but I'm not sure about it as it still requires reading documentation and I'd like vim-polyglot to work out of the box...

So I guess it'll be 4. after all

sheerun commented 4 years ago

On the other hand 4. introduces possibility that syntax is guessed wrongly. And it's especially annoying when creating new files in vim

chipsenkbeil commented 4 years ago

g:filetype_m isn't anything standard (and you can't even get any google results for it), so I'm not big fan of it. I thought about something like let g:polyglot_preferred = ['objc'] which would prioritize objc syntax over any other conflicting filetype, but I'm not sure about it as it still requires reading documentation and I'd like vim-polyglot to work out of the box...

So I guess it'll be 4. after all

I'm confused, I thought filetype_m was standard. It's part of both vim's and neovim's codebases. There's even callouts to it in the vim help docs. Is it not actually standard?

Also, don't get me wrong, I'm fine with choice 1. As long as there's a solution, which there is by disabling mathematica, I don't have much issue with the current configuration. Would just be good to give some clarity for those trying to use polyglot and also work with objc files that are *.m.

sheerun commented 4 years ago

Nice resource with heuristics is https://github.com/github/linguist/blob/master/lib/linguist/heuristics.yml I could implement this but I'd need some help with mapping language names in linguist to ft names in vim plugins

sheerun commented 4 years ago

I'm confused, I thought filetype_m was standard. It's part of both vim's and neovim's codebases. There's even callouts to it in the vim help docs. Is it not actually standard?

Not in the "it's documented" and "for every extension" sense

sheerun commented 4 years ago

build.py integrates with linguist but heurestics are not ready yet... please write here if you'd like to improve it

sheerun commented 4 years ago

We now let native vim ftdetect files to decide for ambiguous extensions, so it should be partly solved now. Full solution is still in progress....

sheerun commented 4 years ago

In particular g:filetype_m = 'objc' is respected

sheerun commented 4 years ago

OK heuristics are now implemented and can be defined in https://github.com/sheerun/vim-polyglot/blob/master/heuristics.yaml I've implemented them for m and fs extensions among others. If you find any issues, you can try to fix them there.