Make it possible to limit the size of returned values and output

bbatsov commented 5 years ago

Some clients (e.g. Emacs) can't handle very well huge strings in their REPL buffers. It's also questionable how good UX this constitutes as well.

Ideally we should be able to specify some value/output limit in eval ops, so that we can trim the result accordingly. We can either return the full and trimmed results in the same message or just have clients request to eval the same code again without any size limits if they happen to need the big result.

See https://github.com/clojure-emacs/cider/issues/1115 and https://github.com/clojure-emacs/cider/issues/2507 for more details. @gonewest818 started working on something very similar a while back (https://github.com/clojure-emacs/cider/issues/1115#issuecomment-374487527), so someone might pick up his work if they want to, even though a simpler approach with no caching of the results/output will be fine as well.

behrica commented 5 years ago

I am suffering as well from periodic hanging of cider / emacs, due to accidentally printing large data structures.

I was experiment myself with some changes to cider, and just wanted to comment on this proposed solution.

Eventhough I like the idea to possibly trim the results on the level of the nrepl, I am not sure if this is enough to prevent the slowness of Emacs / cider on evaluating/printing all types of large data structures.

I noticed, that by printing a large sequence of maps in clojure, lots of individual nrepl values get returned. (one per element of the vector)

So even by limiting the size of each of them, the buffer might be filling up too quickly. This might be helped by setting "print-length" to a small value, but still. I might still be a problem for some edge cases (like printing a vector of 50 million integers and having print-length at false or the opposite, namely printing one single huge string)

I worked therefore on a other, rather easy solution, which is to limit the buffer length of the cider buffer (in bytes, not lines) automatically. The idea is just to check the current buffer length before inserting the new response into it, and above a certain threshold simply erase the buffer with (buffer-erase) before inserting.

Maybe it can be interesting as a additional , configurable, solution on this problem.

In my tests, this work reliable, in the sense of Emacs not crashing while it print the very large vector.

It is still unresponsive while it prints, but the buffer size stays in limits (as seen by the mode line) C-c C-c does not work unfortunately But after it has finished the printing, it is responsive again.

So it help rather well again the classical situation of "large vectors of relatively small maps"

(I implemented as well trimming in the cider client, but it does probably make more sense to have it in the nrepl protocol)

The current patch just does this on each "insert-after-marker"

(defun cider--checked-insert-before-markers (string)

  (message "Currrent cider repl buffer size is: %s "(buffer-size))
  (message "Length of string to insert: %s " (length string))
  (if (> (buffer-size) 100000)
      (progn (message "Buffer grew to large -> erase")
            (let ((inhibit-read-only t))
              (erase-buffer))))

  (insert-before-markers string)  ))

I was surprised to see that this even works on well on huge strings.

(apply  str (repeat 100000000 "a"))

as apparently we get "2 insert" events,

String length: 100000000
Currrent cider repl buffer size is: 68 
Length of string to insert: 100000000 
String length: 1
Currrent cider repl buffer size is: 100000068 
Length of string to insert: 1 
Buffer grew to large -> erase

very quickly after each other, so the erase-buffer gets executed before emacs tries to render the buffer. (the (insert-before-markers string) was executed with the huge string)

In this situation emacs gets "slow" for a while but it recovers completely and the "huge buffer" is never seen.

Should we maybe discuss in a other issue ?

bbatsov commented 5 years ago

@behrica As that's strictly CIDER specific it'd be best to convert this into a CIDER issue. You're making some valid points, but they are somewhat orthogonal to what we're trying to solve here.

behrica commented 5 years ago

Agree, we can discuss it here: https://github.com/clojure-emacs/cider/issues/2527

behrica commented 5 years ago

I have a more general comment , which belongs here:

I think a good truncation strategy is very difficult to come up with.

Because there are at least two very orthogonal data structures which makes problems:

huge sequences of rather small elements
huge individual strings inside short sequences

(and everything inbetween)

Every "truncate" function need to pick some cases of structures (seqs, maps, strings, keywords) it wants to check for "too long" and truncate them. it is always a kind of "white list" and at a certain point something unexpected will come, it will let it through and emacs will choke on it. This is true for a default truncater, but as well for the user supplied truncation function.

And the how large is "too large" question, depends on may factors as well. This makes the concept of "truncation" hard to get right.

Maybe instead of "truncation" nrepl should fail, if the "result" is longer then a configured number of bytes.

I see the main use case for this feature to prevent user mistakes, as he "by accident" evaluates a very complex data structure. So the questions of truncating into valid structures is not important to me.

In my view, the "truncation" should happen only, because I did a mistake. In this perspective, an "error" on large responses might be as good as truncation (and simpler)

I would like to see a feature which lets me decide "how many bytes at max" I get when "I press return in the repl". (and this is guaranteed, what ever data structure I get) So I can set this to 10000. In this case I would be happy, if something simply fails, if my response is too long. (Because it probably means that I have a wrong assumption on the thing I evaluate. In reality I never need responses "larger then my screen")

This is a similar idea the "Tidyverse" R package implements for printing of data frames. The result of "printing" a data frame is never larger then a screen size.

What could be maybe added to this, is that the responses get always in full serialized to disk...(on truncation or failure) This would solve the main inconvenience of simple "failing" / "truncating", which is the loosing of an a bit too long result from a long running computation.

Some type of feedback of the "bytes received and dropped" would be usefull as well.

bbatsov commented 5 years ago

I want to briefly follow-up on this - after some experimentation (thanks a lot @pfeodrippe!) we've decided that this approach was fundamentally flawed. We've now opted to work in a different direction - bringing Unrepl's handling for (big) collections (elided collections) to nREPL. That's an extremely flexible and elegant approach to the problem and @cgrand has graciously agreed to port this to nREPL.

bbatsov commented 5 years ago

Closing in favour of #92.

pfeodrippe commented 5 years ago

Awesome! Anxious to see it \o/

nrepl / nrepl

Make it possible to limit the size of returned values and output #75