Add indent parameter to edn_format.dumps()

thiagokokada commented 4 years ago

This adds a pretty printer similar to json.dumps(indent=<int>). However, it does not follow Clojure formatting guidelines, instead formatting in a way more common to users from other languages like Python.

So it will convert this:

{"a": 1, "b": (1, 2, 3), "c": {"d": [1, 2, 3]}, "e": {1, 2, 3}}

To this:

Instead of this:

{:a 1,
 :b (1
     2
     3),
 :c {:d [1
         2
         3]},
 :e #{1
      2
      3}}

This should be already better than the current status quo (that is, no pretty printer at all).

Should fix issue https://github.com/swaroopch/edn_format/issues/39 (unless the author of the issue wanted a more Clojure-like pretty printer).

Alternative implementation of PR https://github.com/swaroopch/edn_format/pull/64. It fixes all issues found by @bfontaine, and also this approach is simpler. However, different of the older approach this one also brings change in the non-indent flow, and it may be slower (however I think the difference will be insignificant).

thiagokokada commented 4 years ago

Using the script from @bfontaine from PR #64 (just one run because I am lazy):

This branch without indent:

1203203 function calls (1024403 primitive calls) in 0.525 seconds

This branch with indent:

1203203 function calls (1024403 primitive calls) in 0.524 seconds

Master:

1119203 function calls (1005203 primitive calls) in 0.456 seconds

So yeah, a slightly overhead, however it does not seem that bad. WDYT @swaroopch ?

bfontaine commented 4 years ago

Awesome! 🙌

FYI I generated 100,000 random EDN structures which I dumped with indent=2 then re-loaded to check nothing was lost in the roundtrip, and I haven’t had any issue like I had with the previous implementation. 👌

swaroopch commented 4 years ago

So yeah, a slightly overhead, however it does not seem that bad. WDYT @swaroopch ?

@thiagokokada Will follow up this week.

swaroopch commented 4 years ago

@thiagokokada This is a really nice implementation! :+1: for writing the tests.

Can we please add docstrings to the functions indent_lines, udump, dump that describes the parameters? For example, I had to read the PR twice to understand what the difference between indent and indent_step :-)
Out of curiosity, do you think indent_lines would be faster by using https://docs.python.org/3/library/io.html#io.StringIO vs. string concatenation?

Thank you!

thiagokokada commented 4 years ago

1) Sure, will do :+1: 2) No, I don't think so. Creating a array of strings and joining them should be really efficient, probably even more than StringIO (the older implementation used string concatenation, that yeah, it is kind slow): https://stackoverflow.com/q/4733693/2751730

swaroopch commented 4 years ago

@thiagokokada

Thanks!
Got it, thanks for looking into it :-)

bfontaine commented 4 years ago

One minor optimization that may help would be to store indent_step * ' ' in a variable instead of re-computing it for every line. It might be worth trying with StringIO/cStringIO just to check.

thiagokokada commented 4 years ago

I applied the small optimization from @bfontaine anyway and add the docstrings asked by @swaroopch.

Now about the StringIO. I am not going to run exhaustive tests, however this is what I got with StringIO:

1203203 function calls (1024403 primitive calls) in 0.510 seconds

There really doesn't seem to have much difference. Actually even using string concat (that should be slower) there isn't much difference in performance, at least using the benchmark from @bfontaine.

I think the current code is more idiomatic Python too and it also avoids an import, so I prefer it as current it is. WDYT?

swaroopch commented 4 years ago

Thank you @thiagokokada !

swaroopch / edn_format

Add indent parameter to edn_format.dumps() #70