jasongilman / proto-repl

A Clojure Development Environment package for the Atom editor
https://atom.io/packages/proto-repl
MIT License
563 stars 50 forks source link

Atom hangs when fetching a large amount of data from the REPL and pretty printing #69

Open jasongilman opened 8 years ago

jasongilman commented 8 years ago

Atom will sometimes hang if executing a command that returns a large amount of data from the REPL especially when using inline mode or pretty printing. Autoeval makes the problem worse. We need to limit the amount of data returned from the REPL to avoid this.

jasongilman commented 8 years ago

I think the right way to handle this will be to create middleware for Proto REPL. I knew I would need to do this eventually. It's probably time to do it.

phillc73 commented 8 years ago

How much do you consider a large amount of data?

I was actually experimenting with this last night. I imported a CSV of around 35MB and 154k lines. Importing with incanter worked but was slow, using closure-csv was fast, but importing with data.csv made Proto REPL hang (may have just been user error too).

However, all that imported data was stored in a defined variable. The one time I attempted to display all the contents of that variable (154k vectors!), Proto REPL did hang irrecoverably.

jasongilman commented 8 years ago

I'd consider that a lot of data. That's way more than I think you could consider displaying in Proto REPL with the current design. My approach would be to use nREPL middleware and only take some top N amount of the data to return for display. I haven't looked into it too much yet.

You can do the same thing manually. If you have a giant csv assigned to a def then display just part of it in Proto REPL using (take 20 my-var) to limit to 20 rows that are displayed.

phillc73 commented 8 years ago

I don't know if it's possible within Atom's capabilities, but it would be useful to be able to browse complete datasets of this size.

New to Clojure, but with good experience with R, some of the key tasks I'm undertaking to learn Clojure are loading and manipulating datasets. In RStudio, it is possible to load a dataset of this size and browse the dataframe in a new tab (see screenshot). This can be very useful to conducting spot checks of data and better understanding content/structure.

Limiting display to just N amount of data to return would possibly preclude or limit this possibility.

Not a deal breaker, just adding that it would be a nice to have. I guess there has to be a limit somewhere though and I'm not experienced enough to understand if this even should be possible via a REPL with Clojure.

pinhookerrstudio

jasongilman commented 8 years ago

It's definitely possible both for Clojure and Atom to handle this amount of data. It just has to be managed in the right way so that it's fast and doesn't use up too much memory. clojure.data.csv has lazy functions for returning data. 154K lines or 35MB of data isn't too much to load in one Clojure JVM. Atom itself isn't great at handling very large files. They're still working to improve things like that. But Atom backed by Clojure should be able to handle that without a problem as long as it's pulling back reasonable sized chunks to display. That's where nREPL middleware will hopefully come in handy. If I just asked for the entire set of data and then try to created 154K nested inline items which each become 12 divs or more the GUI will become sluggish.

I know I can do it because I was recently doing a very similar thing with CSV and parsing over 4GB of data and displaying things with Proto REPL. I wasn't trying to keep all of that in memory but even that size should be possible if I increased the JVM heap size.

jasongilman commented 8 years ago

This is the next big thing that I'm tackling. My hunch was that it was ClojureScript EDN parsing of the results that was slow but this turned out to be completely incorrect. Below is an image that shows the performance analysis displaying (range 10000) in Proto REPL. Automatic Pretty printing and inline display were turned on. I can definitely improve this a lot. The first thing to tackle is the inline display. By writing custom code to reproduce what Atom ink is doing in ClojureScript I can probably shrink cut the total time in half. The next thing after that would probably be to look at using pretty printing within Clojure instead of ClojureScript and Fipp.

proto_repl_display_10k_analysis

jasongilman commented 8 years ago

I created a pull request in Atom ink to fix the tree view issue.

Performance before change for creating a tree view with N children:

Performance after the change is:

I'll manually incorporate this change into Proto REPL until the Atom change is merged.

jasongilman commented 8 years ago

I released 1.1.6 with this part of the performance improvements.

mauricioszabo commented 8 years ago

I found something: when you pretty-print a lot of data with proto-repl, clear the repl, and again pretty-print a lot, it'll sometimes hang.

I think one possibility is because we're adding all these changes to undo stack. Maybe we can have a memory and performance improvements if we call clearUndoStack on REPL's TextEditor.