ocaml / merlin

Context sensitive completion for OCaml in Vim and Emacs
https://ocaml.github.io/merlin/
MIT License
1.57k stars 233 forks source link

meta-issue about judiciously suggesting incorrect completions #319

Open gasche opened 9 years ago

gasche commented 9 years ago

In #282 @dbuenzli rightly insists on the importance of code-completion working under the assumption that our knowledge of the environment may be wrong (because of previous programming errors). It would be nice to centralize the discussion (which has been sprawling over several issues) in a single place to make it easier to go back to the arguments in the future.

I think there are important questions to be discussed among, for example:

  1. how do we understand the act of programming and its reliance on completion?
  2. what is a good design for handling incorrect assumptions? Should we enrich the type-context-computation of Merlin to be fuzzier (eg. assume recursivity of the current definition), or have a separate heuristic for "dummier" completion candidates, and combine them somehow?
  3. should we aim to be merely distracted, or rather dummy, or just plain stupid? Is it important to understand finely what the programmer errors may be, or is just any non-context-dependent completion strategy the right fallback? (This probably depends on how suggestions are combined -- point (2))
gasche commented 9 years ago

Also relevant to #298 : if the solution is to fallback to a dummier context-free algorithm, it may also solve polymorphic variant inference.

Also relevant to #296 , although I suspect @def-lkb knows how to detect that a record field name is a valid choice syntactically and do something "smart" on this.

dbuenzli commented 9 years ago

Your first sentence may actually embody the issue. I'm not interested in code completion, machines are not up to the task yet – well except if you subscribe to proof irrelevance. I'm interested in identifier completion. With respect to that, why do I want identifier completion ?

a. To type faster. b. To avoid mispellings. c. To recover my partial remembrance of an identifier I know exists but don't remember the specifics.

Now this is not to say that the suggestions provided by completion should be entirely dummy. The context in which you operate is a hint of what you may want to complete. For example type names rather than value names in a type definition. So the context, heuristics and typing information should be used to order (and sometimes filter, but don't filter too much) the suggestions in case of ambiguity.

I don't think it is the task of completion to make my programs type check, this is an issue that should be dealt with separately. Completion should help me express my ideas by supporting me in the three points above. Once the idea is expressed the type checking problem can be tackled by having a dialogue with my merlin assistant.

samoht commented 9 years ago

Just a random idea: maybe you want "exact" completion when you use external names (for instance List.<TAB>) but you want a fuzzier heuristic when the identifiers might be in the current file. Or we can have exact completion for anything after a module identifier but a dummy one for top-level identifiers.

dbuenzli commented 9 years ago

I think that for this to be eventually satisfactory a documented, principled, approach to the problem should be undertaken (maybe that's already the case in source code). Here's a possible approach.

  1. Enumerate and define precisely the completion contexts.
  2. Define a non-ambiguous strategy to detect the completion context associated to a cursor position (note there may be an `Unknown context).
  3. Define notions of completion sources and their contents. Both from a high-level (for end-user understanding) and from a data harvesting point of view.
  4. For each completion context define the list of completion sources to tap from ordered by trustworthiness/likelyhood.

This would make it easier to organize, refine and provide feedback on the feature.

gasche commented 9 years ago

Your first sentence may actually embody the issue. I'm not interested in code completion, machines are not up to the task yet – well except if you subscribe to proof irrelevance. I'm interested in identifier completion.

On the contrary, I suspect the scope of my first sentence is too narrow. A programmer performs many different acts of programming, some of which can be assisted by the editor, and for almost each of those the question of "what if I was wrong in some other place?" or "why limit my freedom to go wrong, which is part of my creativity?" arises and can be asked.

I do agree that pinning down the specific use-case of identifier completion helps moving forward. But ideally we could also have a set of design principles that apply generally to those tasks, to help evaluate and refine specific proposals.

dbuenzli commented 9 years ago

Le jeudi, 27 novembre 2014 à 16:14, gasche a écrit :

On the contrary, I suspect the scope of my first sentence is too narrow. A programmer performs many different acts of programming, some of which can be assisted by the editor, and for almost each of those the question of "what if I was wrong in some other place?" or "why limit my freedom to go wrong, which is part of my creativity?" arises and can be asked.

Interesting question but I would suggest not to pontificate that too much.

As is the case most of the time with computers: don't try to outsmart the human and especially don't get in its (cumbersome) ways of operating but support them – in this case it should be understood that not each editing operation performed by a programmer is a well-typed one, which means that type information should not be relied too much upon and be the only source of information.

Ultimately we have a last call that is the compiler and being able to render the specific points where it complains, in source context, with good error messages and explanations is the key assistance (and there are a lot of improvements/experiments that could be made on that front as I suggested in a private message to @def-lkb).

Best,

Daniel

let-def commented 9 years ago

Sofar, Merlin was built to expose the real environment seen by the typechecker. It is an important point to me, as it reinforce the mental model a developer has of OCaml. Merlin should not mislead by displaying fuzzy completions. As the tool targets developers, I find acceptable to have a quite strict behavior (as @dbuenzli said, this is servicing the typechecker). (I wouldn't trust a developer with a too fuzzy model of what is going on).

With this interpretation, suggesting the name being defined would be "outsmarting" (or "lying", as this is no real completion). But if it is made clear that this doesn't come from the environment and is really a suggestion, this looks reasonable. IntelliJ for instance goes beyond completion to suggest meaningful things. I would even like the 'rec' keyword to be inserted for me.

On the topic of meta. The feature and ui design space is quite large. A lot of decisions are made from our −merlin's developer− point of view, after short-time testing on our side and few feedback. A place to discuss those meta directions would be welcome.

gasche commented 9 years ago

But if it is made clear that this doesn't come from the environment and is really a suggestion, this looks reasonable.

I had several distinct ideas on this front:

I would even like the 'rec' keyword to be inserted for me.

Yes! This is exactly the kind of "programming act" that the editor-assistant should help with. I think @lefessan has an Emacs script that takes an error message produced by the compiler and tries to "fix" the source automatically, maybe it could be a source of ideas for Merlin.

A place to discuss those meta directions would be welcome.

We could ask for a mailing list about "editor tooling", but I would expect the volume to be higher than the bugtracker (and the list to be more amenable to, ahem, pontifications). Are you interested?

dbuenzli commented 9 years ago

Le jeudi, 27 novembre 2014 à 18:47, def-lkb a écrit :

It is an important point to me, as it reinforce the mental model a developer has of OCaml. Merlin should not mislead by displaying fuzzy suggestions. As the tool targets developers, I find acceptable to have a quite strict behavior (as @dbuenzli (https://github.com/dbuenzli), thit is servicing the typechecker). (I wouldn't trust a developer with a too fuzzy model of what is going on).

In my opinion that's not a good stance. The very fact that we have a typechecker is to delegate that task to a machine once we have expressed our ideas, otherwise we would all be programming in dynamically typed language because you know, we know what we are doing. It's not because a programmer forgets a rec keyword that he has a fuzzy model of what is going on, it's just that its attention is located somewhere else at the moment and bothering him with that detail right now is detrimental to the idea forming process.

With this interpretation, suggesting the name being defined would be "outsmarting" (or "lying", as this is no real completion). To define lying you have to define truth. The problem is that you can't have truth while a programmer is editing a set of files to reach its eventual typechecking goal. At the moment I can't count the number of annoying misleading signals merlin is constantly giving to me while I edit code (in particular the error overlay should really not stick when you start editing again). Manipulating names is a pervasive operation in programming so you want to help the programmer in that task and this while the files are not in a typecheckable state (one example when I'm trying to complete a polyvar, I want merlin to look into all the interfaces and sources known to the project for identifiers of the form "`.*", and yes, a regexp search will do).

I really think that merlin should be thought out as a dialogue between the programmer and merlin, where editing phases (with good identifier completion support) are interleaved with typechecking feedback phases to reach the eventually typechecking goal. Having the typechecker right behind my back while I'm editing code is not useful, as it can't possibly follow my way of forming the program, which again, is certainly not by performing a series of atomic well-typed editing operations.

Daniel

dbuenzli commented 9 years ago

But if it is made clear that this doesn't come from the environment and is really a suggestion, this looks reasonable.

Do it naturally, like ocp-index, identifiers from the environment have a type, the others don't have a type.

trefis commented 9 years ago

Just to react quickly on gasche first two points (and dbuenzli last post): this looks to me like the kind of things that your editor already knows how to do. Indeed, unless I am mistaken, ocp-index provides such a feature with https://github.com/OCamlPro/ocp-index/blob/master/tools/ocp-index.el#L33 . And I don't think it is our job to decide for the user what kind of completion he wants. We provide one such "source", a "semantic" one, if you want a more "fuzzy" one on the side feel free to enable it, but I don't think we should be the ones to enable it (or disable it).

I do otherwise think that the question asked on this thread is an important one, but the particular case of identifier completion is already well handled by your editor, we shouldn't bother with it too much.

As for "where should meta discussions happen?", I think the "merlin-discuss" mailing list would be the right place.

dbuenzli commented 9 years ago

Le vendredi, 28 novembre 2014 à 08:44, Thomas Refis a écrit :

And I don't think it is our job to decide for the user what kind of completion he wants. We provide one such "source", a "semantic" one, if you want a more "fuzzy" one on the side feel free to enable it, but I don't think we should be the ones to enable it (or disable it). Sorry then I completely misunderstood the project. When def-lkb made me switch to merlin I thought that merlin was about providing me good, well designed, integrated, editor support to program in OCaml.

If that's the case then designing this involves making well thought decisions like which fuzzy source you'd like to tap from in a given completion context, "it's not our job to decide" is not about designing things. A well designed merlin emacs mode would have no emacs customize-group.

Now if that's not the case, if the charter of merlin is "be in the mind of the OCaml typechecker, all the time". Then forget about all this discussion I'm losing my time talking to the wrong persons as this is certainly not a way of helping programmers to write their programs. With one author thinking that it's right to service the typechecker rather than the human and the other basically indicating there's no problem, I don't see this project going anywhere.

I do otherwise think that the question asked on this thread is an important one, but the particular case of identifier completion is already well handled by your editor, we shouldn't bother with it too much.

From a programmer's perspective when I use merlin, the completion capabilities as judged from the points of my first message, are crap at the moment.

Best,

Daniel

samoht commented 9 years ago

And I don't think it is our job to decide for the user what kind of completion he wants. We provide one such "source", a "semantic" one, if you want a more "fuzzy" one on the side feel free to enable it, but I don't think we should be the ones to enable it (or disable it).

To rephrase @dbuenzli reply, most users assume that the emacs and vim bindings are also part of merlin, and as such, they should have sensible default: this is related to #220.

dbuenzli commented 9 years ago

@samoht Actually that's really not a rephrasing of my reply, I think there's a real integration work to be done, it's not just about letting other things to work in parallel with merlin.

I mean at the moment trying to complete a single backtick in merlin's emacs mode is borderline ridiculous and trying to complete a backtick followed by any letter really is. However this could be made awesome by being smart about it using OCaml specific knowledge. The sheer amount of incorrect errors merlin is reporting to me while I edit code is also problematic, since it's basically showing me outdated information which is both distracting and misleading.

What seems to be misunderstood here is that the supposedly "semantic" information merlin provides is not better or more correct because it's based on incomplete and/or outdated and/or wrong assumptions about the final environment since you are in the very process of defining it. Hence the conversation process I'm talking about were we gradually merlin and me get to reconcile our view of the final environment. But for all this to be made really good and useful you need to move away from the completely flawed, "being in the typechecker's mind is what will help the programmers" ideology.

gasche commented 9 years ago

With one author thinking that it's right to service the typechecker rather than the human and the other basically indicating there's no problem, I don't see this project going anywhere.

Meh, this is unnecessarily abrasive.

@trefis , do you have this position because of the current perception of your workforce (you can't dedicate time to decide which completion your user wants), or because of your broad perception of what Merlin is a project (a type-checker-assistant for the editor, rather than an editor-assistant for the programmer)? I think this has deep implications; in particular, if Merlin is not the right project to manage the orchestration of different programming-assistant tools for OCaml programs, which project is it? Is it editor-specific or can it be as editor-agnostic as Merlin? Is it an existing project (tuareg, typerex) or do we need to setup one?

.

On an unrelated front, YouCompleteMe (and the ycmd offshoot) have some related design description about how "semantic" and "non-semantic" search are ordered (the "semantic" one is only requested in some specific contexts that are easy for users to reason about, with a sharp distinction between "help the programmer wite the code he knows he wants to write" and "help the programmer explore its options at a slower pace", trying to be semantic only for the latter workflow).

(They also do subsequence matching instead of prefix matching.)

hcarty commented 9 years ago

In the case of vim there is already a concept of different completion methods (intelligent b G~completion, generic word completion, file system completion, ...). merlin fills the intelligent completion role. Given this context merlin's current approach of completing based on what the background ocamlmerlin toplevel sees works quite well. There are other vim plugins which can chain these completion methods together, perhaps similar to the ocp-index emacs feature @trefis mentioned.

This combination of tools/plugins works quite well in my experience. Each has room for improvement but those improvements are probably best served by sticking to their particular area of functionality.

dbuenzli commented 9 years ago

Le vendredi, 28 novembre 2014 à 21:33, Hezekiah M. Carty a écrit :

This combination of tools/plugins works quite well in my experience. Each has room for improvement but those improvements are probably best served by sticking to their particular area of functionality.

This completely misses the point.

First I personally don't have the time and desire to try to experiment with arbitrary combinations and configuration of plugins that may be available in the wild and I guess that this is the case for most programmers both new and old.

Second merlin's supposedly "intelligent" completion as it stand is not and should be combined with less type stringent (again… because it can't possibly know the final context) yet OCaml specific strategies all in a well integrated and single completion command to invoke. I suggest to go back to this comment to see what I mean:

https://github.com/the-lambda-church/merlin/issues/319#issuecomment-64791545

Daniel