hazelgrove / hazel

Hazel, a live functional programming environment with typed holes
http://hazel.org/
MIT License
684 stars 45 forks source link

Fix `,-` parsing #1277

Open AlienKevin opened 2 months ago

AlienKevin commented 2 months ago
Screenshot 2024-04-22 at 12 12 13 PM

Current requires a space to separate the , and the -.

disconcision commented 2 months ago

@dm0n3y curious how the new system handles such cases. this is an awkward one in the current arrangement, as we have a notion of operator characters, where a (possibly user-defined soon) operator can consist of any run of those characters. there are a number of ways this categorization could be made more precise, but the case of characters which can be both prefix operators and parts of infix operators like this one seems pernicious.

dm0n3y commented 2 months ago

First thought is that comma is special in the same way parens/braces are and would not be included in the arbitrary operator token class

disconcision commented 2 months ago

@dm0n3y solves ,- but not e.g. *-

dm0n3y commented 2 months ago

Hard to solve in general short of doing some more elaborate context-informed lexing. ,- lexing into an unrecognized operator is esp annoying though and worth specializing. I'm more ok with *- lexing into an unrecognized operator. OCaml makes the same distinction.

image
cyrus- commented 2 months ago

could try to restrict infix operators to not end in a token that can also be used as a prefix operator?

disconcision commented 2 months ago

@cyrus- could work but would involve some slightly grody intermediate states, e.g. is "-" was an operator then it goes from being one operator to two back to one again. could say more restrictively that prefix operator characters can't be used as non-initial characters in infix ops.

i don't find the ocaml approach fully satisfying but the fact that they're doing it suggests it's at least annoying to do better

disconcision commented 2 months ago

@dm0n3y i feel like in principle there could be something analogous to your error-counting metric at the lexing level. an invalid token gets broken up if doing so results in a state with less total errors

dm0n3y commented 2 months ago

Yeah ultimately I think there should be character-level molding, which is what I really meant by context-informed lexing above. I agree with @disconcision that OCaml approach is not perfect but best bang for buck short of full solution.