weavejester / cljfmt

A tool for formatting Clojure code
Eclipse Public License 1.0
1.11k stars 120 forks source link

Support for range format #212

Open ericdallo opened 3 years ago

ericdallo commented 3 years ago

Hello, on clojure-lsp, we use cljfmt as our format feature and it works well for most cases, thank you for your work!

There is an issue though related to performance with huge buffers. The [LSP protocol]() allows two kind of format requests: format the whole buffer and format a ranged code. clojure-lsp implement both but the range format seems to have some performance issues with large buffers since we don't correctly format the range code but the whole form like defn, def etc.

Here is how we format ATM, the reformat-node is not enough for us as if I select only the bar function of the code below, the spacings will not be correct:

(my-func (foo a
                      b
                      c))

(cljfmt/reformat-node foo-node)

(my-func (foo a
    b
    c))

This happens because cljfmt doesn't know about the previous code before the foo node.

My suggestion is to add another function on cljfmt that accept a line and column to format, just like we call to the full buffer format:

(cljfmt.core/reformat-string text line column)

WDYT?

weavejester commented 3 years ago

I don't understand how your proposal is supposed to work. Can you explain further?

ericdallo commented 3 years ago

Sorry for not be specific, I think my suggestion is missing the end range. You can see above an example where | means the start and end range selected by the editor:

(my-func (|foo a
                            b
               c|))

clojure-lsp would call:

(cljfmt.core/reformat-string full-text 1 12 3 16)

would result formatting only the selected range:

(my-func (foo a
              b
              c))

Does that makes sense?

weavejester commented 3 years ago

Why is the b so far right after formatting? Even if you're just formatting a section, shouldn't it still be in line?

ericdallo commented 3 years ago

hahaha, yes, sorry, GitHub indentation is weird, fixed!

weavejester commented 3 years ago

If implemented, this should be a separate function, e.g. reformat-region. I also don't know how you'd format a region that is missing ending brackets or can't be parsed in isolation.

Have you considered pulling out just the current top level form, adding some unique markers to mark the start and end of the region, formatting and then snipping out the relevent area using the markers?

ericdallo commented 3 years ago

That's how we do ATM, we format the whole top level form, but if we format only the parent form of the selected range, it will ignore the spacing

weavejester commented 3 years ago

but if we format only the parent form of the selected range, it will ignore the spacing

I don't understand what this means. Can you provide an example?

ericdallo commented 3 years ago

Sure, so, as I said it here, If I format the code with the selected range between the |s:

(a
 (b
 |(c
          (+ 2 3)|)))

I'd expect to be formatted as:

(a
 (b
  (c
   (+ 2 3))))

But when we call (cljfmt/reformat-form c-node {}), we get:

(a
 (b
  (c
 (+ 2 3))))

You can see that the spacing before c was ignored because cljfmt only knows about the c-form, so the spaces that it adds is based as if c was a top level form.

That's why I suggested having a new function on cljfmt that we pass the whole text, the range and cljfmt take care of formatting only the range specified.

weavejester commented 3 years ago

Okay. How do you envision this new reformat-region function to work? How are you going to take into account regions that cut off at syntax boundaries? If you're not going to look at the top level form, how are you going to get the indentation?

I'm also curious as to why formatting the top level form doesn't give you the performance you need. How large are the forms that you're having problems with? Do you have examples?

ericdallo commented 3 years ago

For the regions that cut off at syntax boundaries, cljfmt could just get the parent form of the start line/col maybe? If we send the whole text, cljfmt would know the indentation getting the parent/top form to know the indentation, right?

The initial issue use simple forms, but I could not repro that, I can repro that on forms like this though, If I try to slurp/barf a parent, it'll call later the rangeFormatting of clojure-lsp, and for this case takes about 6s to format this top level form!

ericdallo commented 3 years ago

So, I just found that this could be a cljfmt/reformat-node issue, if I tell cljfmt to format this whole file, it takes about 1096ms, I think is expected as this buffer is kind of big. But If a rangeFormat only this code, it takes 6539ms.

I didn't make a benchmark, those are numbers from clojure-lsp server <-> editor client request time, so probably it's adding some ms to that, even so, 6s seems too much to a reformat-node, am I wrong?

weavejester commented 3 years ago

Yeah, that's taking way too long. It should be a few milliseconds at most. There must be something that's causing it issues.

snoe commented 3 years ago

assuming there's no problem with reformat-node, for the proposal, I think it would help to introduce a function like reformat-enclosing-form instead of range based which probably wouldn't work well with rewrite-clj.

It could jump to a cursor position and call z/up if on a token node to get to an enclosing list, map, or vector node. Then also look at the parent form to apply indent rules and determine how much the enclosing form should be indented

clj-rewrite should be able to then reformat the enclosing node followed by pushing each subsequent line some number of spaces (or tabs) to the right according to the rules.

Right now, clients have to operate on a top level form because they don't know the indent rules for nested forms. Operating on the top level form of, say, big edn files can be pretty painful for clients that are autoindenting.

weavejester commented 3 years ago

That's fine, but the first step should be fixing the performance issue, as extra functions may be unnecessary if the existing functions are made to be performant.