Allow unbalanced brackets in symbols

parinfer / parinfer.js

Let's simplify the way we write Lisp

https://shaunlebron.github.io/parinfer

MIT License

1.76k stars 40 forks source link

Allow unbalanced brackets in symbols #197

Closed gilch closed 2 years ago

gilch commented 2 years ago

This is obvious nonsense in Clojure, but perfectly valid Common Lisp (sbcl repl):

* (let ((}{ 2)
        (]]] 5))
    (- }{ ]]]))

-3

The symbols here are }{ and ]]].

In this case, Parinfer should be managing parentheses only, not other bracket types, but it has to work the way it does for Clojure. So maybe a config option?

The workaround seems to be to escape any unbalanced brackets with a backslash, so like \}\{, probably because Parinfer is recognizing it like two Clojure character literals. I think Parinfer handles balanced brackets in symbols OK? It could theoretically move one at the end of a line, but I can't think of an otherwise well-formatted case where this would happen.

shaunlebron commented 2 years ago

Thank you for another non-clojure example. I’m not yet sure what kind of abstraction I could wrap parinfer in to make it respect arbitrary rules in different dialects.

gilch commented 2 years ago

Seems to come down to distinguishing atoms from brackets, and knowing which opening bracket types pair with which closing brackets. A config file taking some regex could almost do it, and is maybe the Pareto 80/20 solution, but there are a few difficult cases, like the nested #| |#, although some enhanced engines (like PCRE) could probably handle those too.

A config file with callback functions should be powerful enough, but I'm not sure how that would work with the e.g. Rust implementation if callbacks are written in JavaScript. Maybe each implementation would be configured using its own language?

Various text editors have customizable syntax highlighting. There might be some good ideas there. A parser that only distinguishes brackets from atoms is probably not going to be very complex.

shaunlebron commented 2 years ago

Right, parinfer’s parser only watches for delimiters to note when it has crossed a context boundary (to rule brackets found in strings or comments), and only tracks them insofar as it matters to finding structural parens to do its inference.

And actually, for your case, it seems like just having an option to process parens-only (ignoring [] and {}) would be sufficient. Sorry I missed that on first read.

For the larger set of dialect-specific delimiters, we would need some subset of the language parser that allows us to know precisely when we’ve reached the end of it. I would avoid regex here. And maybe it’s best for now to support a set of dialect options rather than allowing customizable parsers—assuming there are only a handful of dialects with simple enough block comments and/or heredoc strings.

shaunlebron commented 2 years ago

I added options for openParenChars and closeParenChars, and was able to test that your case works with it:

const parinfer = require('./parinfer.js')

const text = `
(let ((}{ 2)
      (]]] 5))
  (- }{ }} ]]]))
`

const opts = {openParenChars:'(', closeParenChars:')'}

const result = parinfer.indentMode(text,opts)

console.log(result)

console.log(result.text)

$ node scratch.js 
SUCCESS:

(let ((}{ 2)
      (]]] 5))
  (- }{ }} ]]]))

I will increment and publish a version with these changes after some more non-clojure dialect changes.