mbucc/erlfmt - Githubissues

Read Erlang source on stdin, reformat with erl_tidy, and write to stdout. No options, you get what erl_prettypr:format/2 gives you.

                              Back story

The problem seemed simple: read Erlang source code on stdin and send a
nicely formatted result to stdout.  In vim, I got used to this when I was
learning Go:

        :%!gofmt

Just type the code without any concern for formatting, and then format the
entire file in one command.  I now have xmlfmt, jsfmt, javafmt and I
wanted a similar utility for Erlang.

I started with erl_tidy, which

        Tidies and pretty-prints Erlang source code, removing
        unused functions, updating obsolete constructs and
        function calls, etc.

Seems perfect ... except that erl_tidy does not read from stdin.  I posted
a question to the Erlang questions mailing list asking if I missed
something and the answer was a clear no.  People pointed me to alternative
approaches: an Emacs elisp module, a rebar3 module that wraps erl_tidy,
and a state-machine/parser written in Erlang that the vim Erlang module
uses.  The first two didn't solve my problem, and I didn't like the
complexity of the third approach.

First attempt: use basic Erlang modules

While Googling around, I found a short post showing how you can format
Erlang code using erl_scan (string -> tokens), erl_parse (tokens -> form)
and erl_pp (form -> string).  So I started coding that up.  It worked
great. Initially.

The first problem I hit was that erl_parse:parse_form/1 does not handle
white space or comment tokens.  OK, no problem.  To keep moving forward I
implemented the quick hack of only dropping comments that came inside a
form, and keeping the ones that came before.  Not great, but I wanted to
get something working.

The next problem I hit was pre-processor constructs.  It turns out that
erl_parse does not understand pre-processor bits either (macros, imports,
etc).  So, another hack: when we hit a dot-terminated token sequence that
includes a pre-processor construct, just print out the raw text that
generated those tokens and leave it at that.

When that was done, I decided dropping comments was not acceptable.  While
searching around for how to re-insert comments, I came across a reference
to the epp_dodger module and a function that re-inserts comments.  Both of
which is used by erl_tidy.  It seemed stupid to re-write erl_tidy so I
went back to square one.

Second attempt: teach erl_tidy about stdin

Erlang can read stdin just fine.  In fact, most of the input functions in
the io module read from stdin by default.  Erlang provides the standard_io
atom that you can use for an IODevice argument.  I started hacking on
erl_tidy, intending to use the "special" file name of a single dash (-) to
tell erl_tidy to read from stdin.

It was simple to add a new read_module("-", Opts) and pass standard_io to
epp_dodger:parse/3.  But, the next chunk of erl_tidy logic reads comments
from the same file again, which is not possible with stdin, since the
stream was already consumed by parsing.

I briefly looking to see if I could somehow turn a string into an IODevice
(like you can in Java), but nothing turned up.  So, it seems like I have
to write stdin to a file in order to use erl_tidy.

Third attempt: call erl_tidy:file/1 from a shell script.

It was trivial to write a shell script that pipes stdin to a file.
Calling erl_tidy from the shell script was not.  My first attempt looked
like this:

        $ erl -run erl_tidy file $TMPFN

which produced:

        =ERROR REPORT==== 9-May-2016::20:13:30 ===
        erl_comment_scan: bad filename: `['hmmmm_sup.erl']'

and hung there, waiting in the Erlang interpreter.

It turns out that when use the -run flag and pass it arguments, Erlang
assumes the receiving function has one argument---a list.  That's why the
filename in the error message has brackets around it ... it's an list not
a string..

Final attempt: call escript from a shell script

The final result is what you see in the repository now.   It's so easy to
pipe stdin to a file in a shell script that I saw no reason to re-write
that in Erlang.  So I added a short escript that unpacks the file name
from the list and passes that to erl_tidy.

The result: I was able to erlfmt all 1,795 source files under
/usr/local/Cellar/erlang/18.2.1, with only one error:

        /usr/local/Cellar/erlang/18.2.1/lib/erlang/lib/wx-1.6/src/gen/gl.erl
        ./erlfmt: line 6: 25399 User defined signal 2: 31 ./erlfmt.escript "$TMPF"

That file is a generated one, and is 971KB in size.  I got the above error
when I had erl_tidy write the reformatted file to the temporary file.
When instead I told erl_tidy to write the result to stdout, I got this
error:

        escript: exception exit: badarg
          in function  erl_tidy:file/2 (erl_tidy.erl, line 295)
          in call from erl_eval:local_func/6 (erl_eval.erl, line 557)
          in call from escript:interpret/4 (escript.erl, line 787)
          in call from escript:start/1 (escript.erl, line 277)
          in call from init:start_it/1 
          in call from init:start_em/1 

It successfully parsed the other 1,794 files, which gives success rate of
0.99945.  Not too hot in the Erlang world, but I don't expect to be
formatting files that big.  Good enough.

The back story back story.

Why such a long README?  Erlang questions mailing list recently had an
interesting thread on code documentation ("rhetorical structure of code").
It was a long thread and a lot of things were said, but one in particular
stuck with me: code documentation rarely shows the starts and stops, the
litany of failed experiments that you encounter on the way to the final
product.  And that those failed experiements are sometimes useful in the
future.

mbucc / erlfmt

readme