kkinnear / zprint

Executables, uberjar, and library to beautifully format Clojure and Clojurescript source code and s-expressions.
MIT License
554 stars 47 forks source link

From `:rod` to `:semantic` #229

Closed vemv closed 2 years ago

vemv commented 2 years ago

Hi there,

in https://github.com/kkinnear/zprint/issues/170 we devised a style called :rod which various people liked/adopted. I'm using it at work currently (even if we aren't quite happy with the results yet).

However I never found the time to explain more exhaustively how it looks like (for a variety of forms), and importantly, why, so that users can understand this choice and perhaps make this style easier to share/adopt.

While :rod isn't necessarily a widespread style, it is a very reasonable/non-fanciful one, derived (and compatible) from well-known styles and backed by years of real-world usage in team settings.

Importantly, I think that implementing this style would be a great showcase/benchmark for zprint's capabilities, and surely would leave composable primitives.

Perhaps :rod was bit of an organic name, I think that :semantic would do it more justice.

So, without further ado...

The :semantic style

tldr

:semantic is a subset of:

which is stricter in a few ways, emphasizing meaning over concision, and a sense of homogeneity that makes formatting more predictable (e.g. I can predict how a formatting will look like, vs. seeing a seemingly random result influenced by concision or a column limit).

Explanation

It is 'semantic' in that there's a rigid mapping between forms (e.g. defn, if) and where the newlines should be located. So a maintainer can say things such as:

In other words: by visually scanning the whitespace, before seeing the contained forms, I can already tell what I'm about to find.

This IME greatly decreases cognitive costs, reducing friction as one reads/maintains code.

Importantly, it makes Clojure look more 'like a language', as opposed to simply mashing sexprs together.

Case study: if

As a very self-contained example of my ideas, we can consider `if`. Here's an example to be avoided: ```clj ;; bad (if (foo) 1 2) ``` What's wrong here? `(foo)` represents a condition, while `1` represents a 'then'. By placing them in the same line, they look as if they belonged to the same category. They don't: they're not simply different items in a list, but things with profoundly different meanings. So allocating one line for each kind of thing makes this distinction clearer: ```clj ;; good (if (foo) ;; one line for condition 1 ;; ...one line for then... 2) ;; ...and one line for else ``` There are many other examples for making such distinctions. Such as: ```clj ;; bad (fn [] []) ``` Here we put an argv and a `[]` return value in the same line. They look visually identical and nearby, so one is suggesting that they are the same kind of thing, when they aren't. This is what I mean with `make Clojure look more 'like a language'`. In many programming languages observes a certain relationship between syntax and newline placement, and honors it _even if there's a way to shortcut it_.

The rules in question

1 - Certain forms have a canonical newline placement

Under :semantic, certain forms have a canonical placement for newlines and blank lines.

The most important being the following:

1.1 - defn, fn

```clj ;; Examples from https://github.com/kkinnear/zprint/issues/170 (defn foo [] 1 2 3 4 5) (defn foo ([] 1) ([a] 1)) (defn foo ([] 1 2 3 4 5) ([a] 1)) ```

1.2 - defprotocol, defrecord, extend*

```clj ;; One line per method ;; One line for docstring if present ;; One blank line between methods (defprotocol AProtocol (doit [this] "Docstring") (dothat [this that]) (domore [this that])) (defprotocol AnotherProtocol (xdoit [this] "Docstring") (xdothat [this that]) (xdomore [this that])) ;; Fields vector in same line as `ADefrecord` ;; One blank line between methods ;; One blank line between protocols ;; No blank line between a protocol name and its first method (defrecord ADefrecord [f1 f2 f3] AProtocol (doit [this] (run! println [1 2 3]) (println this)) (dothat [this that]) (domore [this that]) AnotherProtocol (xdoit [this]) (xdothat [this that]) (xdomore [this that])) ;; deftype just like defrecord (deftype ADefrecord [f1 f2 f3] AProtocol (doit [this]) (dothat [this that]) (domore [this that]) AnotherProtocol (xdoit [this]) (xdothat [this that]) (xdomore [this that])) ;; extend-type just like defrecord (except, of course, that there isn't a fields vector) (extend-type ExtendedType AProtocol (doit [this] (run! println [1 2 3]) (println this)) (dothat [this that]) (domore [this that]) AnotherProtocol (xdoit [this]) (xdothat [this that]) (xdomore [this that])) ;; extend-protocol just like expend-type (extend-protocol ExtendedProtocol String (doit [this] (run! println [1 2 3]) (println this)) (dothat [this that]) (domore [this that]) Thread (xdoit [this]) (xdothat [this that]) (xdomore [this that])) ```

1.3 - defmethod

```clj ;; Incorrect - the return value `[]` is not placed in its own line (defmethod wrong-defmethod :default [x] []) ;; Prefered style (defmethod some-defmethod :some-dispatch-value [a b c] (println a) (println b) (println c)) ;; Next-prefered style (when a column limit imposes it): argv after dispatch value (defmethod some-defmethod-B :some-dispatch-value [a b c] (println a) (println b) (println c)) ;; Next-prefered style (when a column limit imposes it): one line for dispatch value, one line for argv (defmethod some-defmethod-C :some-dispatch-value [a b c] (println a) (println b) (println c)) ```

1.4 - if, if-not

```clj ;; Incorrect: `if` has a canonical form (if 1 2 3) ;; Correct (if 1 2 3) ```

1.5 - Macros taking a & body

```clj ;; Incorrect (when 1 2) ;; Correct (when 1 2) ;; ^ Broader rule derived from `when` rule: for any macro that takes a `body`, said body cannot be in the same line as the macro being invoked (defmacro Foo {:style/indent 1} [a & body]) ;; Incorrect: non-body and body macro arguments are in the same line (foo 1 2 3) ;; Correct: there's a newline before the `body` ;; This enables 2-space indentation, allowing humans to see it's a macro call. (foo 1 2 3) ```

1.6 - =, not=, <, <=, or, and

```clj ;; Incorrect (= [1 3 2] [1 3 2]) ;; Correct: one line per item allows visual diffing (by moving the eye vertically) (= [1 3 2] [1 3 2]) ```

1.7 - ->, ->>

`->` is a bit different: it has some considerations as a macro (relative to newline placement), but it indents like a defn. `clojure-mode` also makes this exception. ```clj ;; Correct (-> 0 inc inc inc) ;; Also correct (-> 0 inc inc inc) ;; Incorrect (-> 0 inc inc inc) ```

1.8 - try

```clj (try ;; A newline must be placed after `try` (call1) (call2) (catch Exception e ;; A newline must be placed after the exception binding (call3)) (finally ;; A newline must be placed after `finally` (call4))) ```

2 - Independent forms within a body get one line each

Given a body, e.g. defn's or do's, no two independent forms should reside under the same line.

```clj ;; Incorrect: there are two independent body sexprs within the same line (fn [x y] (+ x y) (+ x y)) ;; Incorrect: there are two independent body sexprs within the same line (defn foo [x y] (+ x y) (+ x y)) ```

3 - The first argument of a function call must be placed in the same line as the function name

Given a function call, its first argument must reside in the same line as that of the function being called. This forces aligned style.


;; Incorrect - causes 1-space indentation which is not desirable
(foo
 1 2 3)

;; Correct
(foo 1 2 3)

;; Also correct
(foo 1
     2
     3)

If it was impossible to make such a sexpr fit within a column limit, I'd prefer if users fixed it manually (which is very often possible with simple refactorings e.g. split into let bindings, or helper defns)

4 - Multi-line sexprs cannot be siblings of single-line sexprs

;; Incorrect - the sexpr `1` is in the same line as a multi-line sibling sexpr within a function call
(comp (fn [x]
        (inc x)) 1)

;; Correct - whenever a sepxr within a function call spans more than one line, then each argument in that function call
;; must be located in its own line
(comp (fn [x]
        (inc x))
      1)

5 - Argument vectors are placed in the same line as the function name

This consideration emphasizes that a function is a function of something. e.g. f is a f(x), which is deeply related to math and FP notation.

;; bad
(defn f
  [x]
  )

;; correct
(defn f [x]
  )
;; Acceptable exceptions are the presence of a docstring and whenever a column limit would not be met.

I think that :rod already satisfies a few of this expectations, and probably a few others are very much at hand.

For others I don't know if they would entail a lot of work. I'd love to know your impression of what's possible first. If everything seems at hand I can try coding the rules myself which would seem nice.

Probably https://github.com/kkinnear/zprint/issues/223 is a blocker - I can't quite reason about things when they currently happen to be broken.

Also, rule 1.5 requires understanding what "special" and "non-special" (body) arguments are, which is related to https://github.com/kkinnear/zprint/issues/225 .

I hope you appreciate the issue writeup, it might sound ranty in some passages but I'm simply trying to convey its rationale. Probably much of the issue could eventually make it to the official doc? I could eventually further refine these words.

Cheers - V

kkinnear commented 2 years ago

Wow! That's a lot to digest. Thank you for the effort to explain all of this to me.

I have read this over several times, but not yet gone into great detail in trying to make it happen. Overall, zprint is designed for just this sort of thing -- you can have it the way you want it. As you have noted.

This writeup could certainly make it into the documentation for zprint, possibly with minor adjustments to the tone so that it isn't quite so adamant about the "right" way to do things. But we can see about that as we get it working.

At first glance, these are the things that might be hard:

  1. You have noted 1.5 requires knowledge of how a macro is defined. Moreover, zprint as usually used, doesn't even know a macro from a function -- unless you tell it in the :fn-map. So for most users, this won't be operational.
  2. I haven't looked into 1.2 yet. I have put a lot of work into this area, and it might be that with a little creative configuration, zprint could do what you want now. Alternatively, it may be that it needs a "guide", like I had to do for :rod.
  3. Rule 3, about "forcing aligned style", and making people refactor code that doesn't meet a column limit, is going to require some work -- possibly only work getting me to understand what you want. I found the link to "aligned" style far from clear, probably because I'm coming from a very different environment than those words were written for.

I'm not asking for you to explain more at this point, though I will be for sure as we get into this.

I want to fix the existing bugs (in particular the multithreading one you noted, #226), and get a new release out.

After that, I think the way to approach this is to work on one or two sections at a time, get them the way you want, and then move on to the next. Possibly with a number of alpha releases that you can play with to see how it works.

I envision this overall as a :style made up of a number of other styles. That way, people could opt-in to one of the styles but not have to buy into the whole thing. It also makes it much easier to maintain and test.

Thanks again for the effort in writing all of this up. I'm looking forward to getting into this soon!

vemv commented 2 years ago

Cheers. Thanks as always for taking these into consideration.

This writeup could certainly make it into the documentation for zprint, possibly with minor adjustments to the tone so that it isn't quite so adamant about the "right" way to do things. But we can see about that as we get it working.

:+1:

You have noted 1.5 requires knowledge of how a macro is defined. Moreover, zprint as usually used, doesn't even know a macro from a function -- unless you tell it in the :fn-map. So for most users, this won't be operational.

FWIW, for my use case I plan invoke zprint from the same JVM as the REPL's, so that knowledge would be accessible. For the "binary / CLI" use case, I could regularly emit zprint files from said REPL JVM, keeping those auto-generated files version-controlled, maybe shared in a .jar, etc.

I found the link to "aligned" style far from clear

You can find snippets for always-align and always-indent in said link. Rule 3 intends that:

After that, I think the way to approach this is to work on one or two sections at a time, get them the way you want, and then move on to the next.

Yes, that would be perfect. Feel free to work on these in a PR format and request a review from me. Quite obviously I'm not in a position to review the impl - instead I can come up with more / trickier test cases for each feature.

I envision this overall as a :style made up of a number of other styles. That way, people could opt-in to one of the styles but not have to buy into the whole thing. It also makes it much easier to maintain and test.

Indeed, this would maximize usefuless and ensure cleanliness.

kkinnear commented 2 years ago

So, I've made some good progress with this work. I have implemented the function name "alias" you suggested in Issue #234. I have also added a particularly finicky enhancement which allows configuration of multiple lines in lists, which several of your requests required and which zprint pretty much didn't do (without using guides). I have also slightly enhanced guides themselves. I rewrote the :rod style not using a guide to see if I could do it, and using the new multi-line capability that worked.

So, I have a moderately enhanced basic zprint (library). I have then written an options map which seems to do (most of?) what you are requesting. At some point, when (if) we like it, I'll bake the options map in as a style directly into zprint, but treating it as a separate "thing" at this point makes sharing changes easier.

I am very interested in you having a chance to try this, in part simply to find test cases where what I've done simply doesn't work right, and also to expose places where my understanding of what you desire is flawed. I expect about equal numbers of each kind of problem to appear.

My thinking on how to share this with you is as follows. I believe you are using zprint as a library, so if I were to push up a [zprint "1.2.4-alpha1"] to GitHub, and share the options map I've created here in the issue, I think that it would be relatively easy for you to clone the repo and have the library right there. It just takes a lein clean lein install to get it into your local maven. Alternatively, I could push an alpha to clojars if that would be easier. Either would work for me.

In addition to an alpha library and an options map to use when testing things, I also need to respond to a number of your requests with a comment or two.

Just FYI (as I'll probably change it a bit as I continue testing), here is the current options map:

(def semantic
  {:style-map
     {:s1.0 {:doc "Remove standard things that don't fit",
             :fn-map {"if" :none, "when" :none}},
      :s1.1 {:doc "defn and fn",
             :fn-map {"defn" [:force-nl-body {:style :rod-base}],
                      "defn-" "defn",
                      "fn" "defn"}},
      :s1.2 {:doc
               "defprotocol, defrecord, deftype, extend-type, extend-protocol",
             :fn-map {"defprotocol" :arg1-extend},
             :extend {:nl-count 2, :nl-separator? true},
             :fn-force-nl #{:fn}},
      :s1.3 {:doc "defmethod",
             :fn-map {"defmethod" [:guided-body
                                   {:guide [:element :element-best
                                            :element-best-first
                                            :element-best-first :newline
                                            :element-newline-best-*]}]}},
      :s1.4 {:doc "if, if-not",
             :fn-map {"if" :arg1-force-nl-body, "if-not" "if"}},
      :s1.5 {:doc "macros like when, many more to do",
             :fn-map {"when" :arg1-force-nl, "when-not" "when"}},
      :s1.6 {:doc "=, not=, or, and, <, >, <=, >=",
             :fn-map {"=" [:hang
                           {:fn-force-nl #{:hang},
                            :next-inner-restore [[[:fn-force-nl] :hang]]}],
                      "not=" "=",
                      "or" "=",
                      "and" "=",
                      "<" "=",
                      ">" "=",
                      "<=" "=",
                      ">=" "="}},
      :s1.7 {:doc "->, ->>", :remove {:fn-force-nl #{:noarg1-body}}},
      :s1.8 {:doc "try, catch, finally",
             :fn-map {"try" :flow-body,
                      "catch" [:arg2
                               {:fn-force-nl #{:arg2},
                                :next-inner-restore [[[:fn-force-nl] :arg2]]}],
                      "finally" :flow-body}},
      :sall {:style [:s1.0 :s1.1 :s1.2 :s1.3 :s1.4 :s1.5 :s1.6 :s1.7 :s1.8]}}})

As you can see, the numbers in the "sub-styles" correspond to the numbers you created in this issue. I expect that when we are finished, I'd name them something more evocative, but for now I wanted to be able to figure out what I was doing relative to what you requested, above.

I'll give explicit instructions as to how to use this after I hear back from you and post an alpha.

It has been an interesting experience trying to bend zprint to fit your approach, and so far I have been kind of surprised at how few modifications I had to make to the basic code to get as close as I have gotten (which may or may not be on target, we'll see). I have certainly had to read my own documentation, since some of the features I'm using were implemented several years ago.

kkinnear commented 2 years ago

General comments on your definition of "semantic", above:

1.5 - Macros taking a & body

;; Incorrect (when 1 2)

;; Correct (when 1 2)

;; ^ Broader rule derived from when rule: for any macro that takes a body, said body cannot be in the same line as the macro being invoked (defmacro Foo {:style/indent 1} [a & body])

;; Incorrect: non-body and body macro arguments are in the same line (foo 1 2 3)

;; Correct: there's a newline before the body ;; This enables 2-space indentation, allowing humans to see it's a macro call. (foo 1 2 3)

For the clojure.core functions that are macros with arguments and then body, it is simply work to identify which take one argument and which two arguments, and then make them :arg1-force-nl or :arg2 today. I need to create :arg2-force-nl to make this trivial for two argument macros. I already did when and when-not, and so it isn't all that difficult to work through the other clojure.core ones. I wouldn't mind help identifying which ones seem important to you.

If there are any with three arguments and then a body, I'd love to hear about it. Well, not really, but it would be good to know.

At present zprint has no way to know about macros in real time. I know that your use case has access to this information, and so you can tell zprint what should be done, but that's not something that I can solve in the general case. We might want to talk about just how you might want to tell zprint what should be done in case I can make that more efficient or easier for you to do.

One take-away for me is that I need to implement :arg2-force-nl, which will also simplify the options map I showed above.

2 - Independent forms within a body get one line each

Basically, zprint doesn't ever do this. Well, with some work you can configure it to do this, and you can do this easily with guides, but unless you try hard, zprint won't do this.

3 - The first argument of a function call must be placed in the same line as the function name

;; Incorrect - causes 1-space indentation which is not desirable
(foo
 1 2 3)

;; Correct
(foo 1 2 3)

;; Also correct
(foo 1
     2
     3)

If it was impossible to make such a sexpr fit within a column limit, I'd prefer if users fixed it manually (which is very often possible with simple refactorings e.g. split into let bindings, or helper defns)

zprint will never do the "incorrect" version, above.

What you are looking for is for a function that doesn't fit on one line, for zprint to "hang" the arguments, instead of "flowing" the arguments. If the hang causes things to look bad (i.e., it is toward the margin, and pushing things even more toward the marging looks bad), then zprint will do this:

(foo
  1
  2
  3)

It will make the indent appropriate (when using :style :community) either for body or argument functions -- to the extent that someone has told it what kind of function it is.

If you want people to fix things manually, the issue seems to me to be: how to tell them that it needs to be fixed?

One answer would be to think of this kind of output:

(foo
  1
  2
  3)

as a signal that you should fix things so that a hang will fit. Or, if it doesn't bother you, don't change it.

4 - Multi-line sexprs cannot be siblings of single-line sexprs

zprint will never do the incorrect thing. Well, I can configure it to do so, but unless I work hard, it will not do this.

5 - Argument vectors are placed in the same line as the function name

To the best of my knowledge, the current implementation of :style :rod does this as you want. That said, it is entirely new in this release.

kkinnear commented 2 years ago

Just a quick thought on 1.6, above -- where you want (= [1 2 3] [1 2 3]) to be:

(= [1 2 3]
   [1 2 3])

zprint has the capability to format things with two arguments on one line, but force them to hang (as you specify above) if there are more than two arguments (or more than three arguments). So, it could be configured easily to do this:

(= [1 2 3] [1 2 3])

(= [3 4 5]
   [3 4 5]
   [3 4 5])

If that is interesting to you.

kkinnear commented 2 years ago

I'm going to convert this issue to a "Discussion", with the category of "Ideas". I think that we will be better able to go back and forth and actually interact on the various details of what you want there. I've never done this before, so my apologies if it all falls apart. Presumably GitHub will place a pointer in this issue redirecting you to the appropriate discussion.