Eclipse support... - Githubissues

hallvard commented 6 years ago

I'd like to discuss what kind of Eclipse support would be interesting. I'm both thinking of easing use of Tablesaw in Eclipse plugins for developers, and viewing, analysing and editing data for users (data analysts), e.g.:

package tablesaw as plugin with proper OSGi manifest
integrate tableview into Eclipse view, e.g. with drag'n drop of CVS files
open CVS files in tableview-based editor
Xtext/Xbase-based DSL for making tablesaw scripts

My personal interest is using tablesaw in a kind of workbench for learning analytics for my Java course at NTNU (google "wiki tdt4100"), but I would guess most of what I will be doing could have general interest, if made generic enough. I know how to do all of the above, but would like to discuss what will provide the most benefit.

lwhite1 commented 6 years ago

This sounds cool. I would be happy to discuss, but I'm an IDEA user so I won't get all the details. @benmccann uses Eclipse, though and might have useful input.

In any case, let me do some homework on the tools you mention so I can understand them better, and then we can chat.

benmccann commented 6 years ago

@hallvard another project you may be interested in is beakerx, which has good tablesaw support

lwhite1 commented 6 years ago

@hallvard @benmccann makes a good point about beakerx, given the ubiquity of Jupyter. When I first looked at the project it was an alternative to Jupyter, rather than building on it. As Jupyter compatible it's a lot more sustainable.

That said, there is a recent thread on hacker news about the pros and cons of Jupyter versus the Mathematica notebook, mainly around the limitations of working in a browser. https://news.ycombinator.com/item?id=16840692

I have always been underwhelmed with what I've seen in the browser-based notebooks. As someone wrote in the HN thread, iPython was developed as a better REPL, which is great, but not a very high bar. For what it's worth, I do my analysis work in Intellij IDEA, rather than in a notebook. If I wanted to share the results widely, I might copy the finished analysis to Jupyter/beakerx for sharing.

The three critical elements of a good environment would need a table editor, a workspace with good support for code completion and syntax highlighting, and a mechanism for displaying plots.

With regard to plots: Tablesaw is almost certainly going to move soon from the current native java plots to javascript based plots, probably built on plot.ly. So the visualization component would need to be the desktop browser or an embedded browser.

With regard to the workspace: I've always been envious of the expressiveness of array languages like APL, but I would not create a new scripting language, although that sounds like fun. I would wrap Tablesaw in Kotlin. I'm told the Kotlin support in Eclipse is not great, so that could be an issue, but Kotlin is easy for Java devs to learn, has good support for functional programming, and supports operator overloading so we could do columnwise addition as c1 + c2, etc. So I think a Kotlin wrapper could close gap with languages like R or Julia that are specifically designed for analysis.

If I were building something from scratch, I might consider modeling the workspace as a WYSIWYG Markdown editor like Typora, where the code in the code blocks was executable. This is basically the notebook idea, supporting something like Knuth's vision of Literate programming.

The main drawbacks to Kotlin are that it doesn't support array index overriding, so you can't say columnX[4], to get the 4th item, and more generally that it inherits the weaknesses of java's generics. That's the price of excellent Java interoperability though.

Regarding the table editor, I don't really have any real suggestions. There is a Java FX Table implementation in Tablesaw's plot module, and Plot.ly does HTML table output. I'm not sure if either helps you move forward.

hallvard commented 6 years ago

My goal is not to make something that will replace R, but allow you to do a bit more within Eclipse before exporting data to a generic format like CSV and continuing in R (if needed). Over time, the data analysis power of tablesaw would improve and more could be done within Eclipse.

The current table editor can easily be embedded in Eclipse, provided that tablesaw itself is embeddable. There is also a CSV table editor that may be used as a starting point (haven't checked the license). So the first step would be to make tablesaw OSGi-friendly, which means solving some dependency and/or packaging issues.

The next question is what operations (filtering, transformations etc.) that such an editor should allow, including spawning new table editors, so you get a flexible environment.

The third issue is making it easier to write tablesaw code. Eclipse has its own Kotlin-like language called Xtend, that supports operator overloading and functional programming. Some library code would make that pretty comfortable.

Plotting in a browser would work, as Eclipse includes embedded browser support.

thomashaselwanter commented 6 years ago

IDEA user here. I like the way IDEA has native support to view a pandas dataframe in Python as a table structure during a debug session. Tablesaw is currently treated like any other object. It sure would be nice to get the same support for Tablesaw in IDEA. Not sure if the best way would be to lobby JetBrains to do it or via a plugin. I guess I'd first try and open a ticket on the IDEA ticket tracker. From experience - it might take a year or longer, but I think there is a good chance JetBrains could support Tablesaw natively as they do for pandas.

hallvard commented 6 years ago

Eclipse allows plugins to support a different and more logical view of certain objects in the debugger's Variables view, e.g. there is such support for ArrayList. But this is outside the scope of what I want to do...

thomashaselwanter commented 6 years ago

I'll move my IDEA support wishlist to a different thread as this one is about Eclipse.

cgrinds commented 6 years ago

The main drawbacks to Kotlin are that it doesn't support array index overriding, so you can't say columnX[4], to get the 4th item

Maybe you mean something else, but Kotlin supports indexed access operators. I use that with a wrapper around Tablesaw so I can do stuff like this:

table["name"] = "Foo"    // this does table.column(key).append(value)
table["elapsed"] = 314
val hasBar = table["name"].contains("bar")

lwhite1 commented 6 years ago

@cgrinds oh, that's great! I stand corrected.

lwhite1 commented 6 years ago

@cgrinds I'm not sure how much of a kotlin wrapper you've created, but if you're willing to share the code somehow, I'd be interested in seeing it. We had made a start at a kotlin wrapper a while back, but removed it for lack of time

cgrinds commented 6 years ago

At the moment, the wrapper is very ad hoc and specific to my use. I incorporated Tablesaw into my app about a week ago, so I've not had much time with it. That said, it might be useful to discuss areas that I've felt the need to add extension methods since that may point to

a) documentation misses
b) API misses
c or more likely, my ignorance :)

My application parses millions of custom log files to find interesting events, aggregates and displays them as ascii tables. Tablesaw is used for the aggregation and display. I suspect this may be a different use of Tablesaw than others, but maybe not...

I'm using HEAD and the new APIs.

Iteration

It wasn't obvious how to iterate a table's rows. I settled on the following, which uses Kotlin's forEach and the fact that Tables implement IntIterable. This works out of the box with no changes since Kotlin rewrites the [r, c] calls to use Table's get(int r, int c) method. This feels a bit hacky and maybe off the beaten path. Not sure many people want to iterate rows manually. The get signature also converts everything to a String which isn't always what you want and forces you to convert. Maybe the more idiomatic way would be to grab each column and iterate over them individually? This might be a good FAQ candidate.

val table = rTable.sortOn("start").select(
    rTable.stringColumn("name"), rTable.stringColumn("resourceType"))
    .rejectDuplicateRows()

table.forEach {
    val name = table[it, 0]
    val resourceType = table[it, 1]
}

Table creation/appending

I created some helpers to make adding data to tables easier. Instead of this:

val urlCol = StringColumn.create("url")
val elapsedCol = DoubleColumn.create("elapsed")
val methodCol = StringColumn.create("method")
val otable = Table.create("", urlCol, elapsedCol, methodCol)
for (event in events) {
    urlCol.append(event.url)
    elapsedCol.append(event.elapsed.toDouble())
    methodCol.append(event.method)
}

I do this:

val table = makeTable("s:url", "d:elapsed", "s:method")
for (event in events) {
    table.append("url", event.url)
    table.append("elapsed", event.elapsed)
    table.append("method", event.method)
}

I'm not a huge fan of the magic values in makeTable above. 's' for StringColumn and 'd' for DoubleColumn but it's nice to skip creating the columns individually. In many cases, I never need the columns because so many of table's api's take the column name, but all told this isn't a big difference.

Last column

This is a simple Kotlin extension that makes it easier to rename the last column; a common action when summarizing.

Assume you have this:

val countByUrlMethod2 = table.summarize("url", count).by("url", "method")

Out of the box:

countByUrlMethod2.column(countByUrlMethod2.columnCount() - 1).setName("Calls")

Extension

countByUrlMethod.lastColumn().setName("Calls")

Better extension that allows regex and chaining. The chaining is nice so you can skip intermediate assignments

val countByUrlMethod3 = table.summarize("url", count).by("url", "method")
    .renameCol("count", "Calls")

Add total row

I often have tables where I want to sum several columns and append a total row at the bottom of the table.

Assume you have this:

  url    |  method  |  calls  |   elapsed   |
---------------------------------------------
 /a/b/c  |  Delete  |   12.0  |   206443.0  |
 /a/b/c  |     Get  |   10.0  |   159215.0  |
 /a/b/c  |   Patch  |    9.0  |   144313.0  |

I created a Kotlin extension named addTotalRow that turns the above table into the table below.

urlsByCount.addTotalRow("url", "calls", "elapsed") means append a total row with the word Total appearing in column url totalling columns, calls and elapsed, which results in this:

  url    |  method  |  calls  |   elapsed   |
---------------------------------------------
 /a/b/c  |  Delete  |   12.0  |   206443.0  |
 /a/b/c  |     Get  |   10.0  |   159215.0  |
 /a/b/c  |   Patch  |    9.0  |   144313.0  |
  Total  |          |   31.0  |   509971.0  |

Printing

The other extension I created is toAsciiTable so I can do things like: urlsByCount.toAsciiTable(Align.Left, Align.Left) to have more control over number grouping, turning doubles into longs, and alignment (using your new API).

| Url    | Method | Calls |   Elapsed |
|:-------|:-------|------:|----------:|
| /a/b/c | Delete |    12 |   206,443 |
| /a/b/c | Get    |    10 |   159,215 |
| /a/b/c | Patch  |     9 |   144,313 |
| Total  |        |    31 |   509,971 |

Again I'm not suggesting you make changes based on this feedback, but thought you might be interested in how your API was being used.

lwhite1 commented 6 years ago

@cgrinds Thank you very much for the detailed feedback. Some thoughts

It wasn't obvious how to iterate a table's rows. I settled on the following, which uses Kotlin's forEach and the fact that Tables implement IntIterable.

Right now, you're doing it the best way possible. I have a local branch where I'm looking at making table iterate tech.tablesaw.api.Row instead of int. What's in master is flawed, but I think this may be better than int iteration. Row is kind of a single-row slice, so it doesn't convert anything or otherwise create garbage unless it's really needed, but still makes rows a bit more 'real'. This will take a few weeks, I think, as I'm still experimenting. Row does support getting values by type, which as you note is missing in table currently:

    String s = row.getString("colX");
    double d = row.getDouble("colY");

etc.

Another use-case for the row object is for allowing comparator based sorts on a table, which is kinda broken in master.

The others I will think about. Better control over column alignment is probably broadly useful, so maybe should be an enhancement.

benmccann commented 6 years ago

The other thing I'd really like to see for iteration is to make stream() available https://github.com/jtablesaw/tablesaw/issues/257

hallvard commented 5 years ago

I've done some work on the Eclipse integration, and here's what I have so far:

A table editor for opening csv files. It support filtering rows by entering expressions in each column in filtering row at the top, e.g. $ > 100, where $ is the current row value. $n or the column name can be used to refer to other columns, so you can filter on expressions across columns.
A set of views, that all can be linked to editors or each other. The idea is that editors and views register tables in a "global" register, which views can use as their source. Table change notifications lets views update when their source changes. The editor notifies when a filter is applied, so a view using it as the source will also be filtered. Currently, I have a a plain table view, a summary view that summaries its source and a crosstab view, the latter two registering their derived tables, so other views can track them. In addition, there are several plot views (based on an embedded web browser) that plot their source table. In addition to a selector for the source, most views have some controls for configuring them, like column and aggregate function selectors. By properly configuring views, you can derive tables and plot in a kind of pipeline.
A simple scripting language with an editor, which essentially extends an existing expression language with some syntactic sugar for tables and date/time values and operators for common methods. The expression language itself provides syntactic sugar for and maps directly to Java and the tablesaw API, so is in some sense similar to Kotlin. It provides many features of Eclipse's Java editor, like error markers, syntax highlighting and completion, based on the Java mapping. The generated Java code (in the src-gen folder, with one class pr. script) includes a main method, so it can be run. There's also an interpreter, so you can run the script in the editor. In this case, top-level variables bound to (named) tables are registered, and hence can be used as the source for table and/or plot views. (The interpreter currently uses its own classpath when executing the script, but soon it will use the project the script is in, so it can use ordinary Java classes developed the standard way.)

lwhite1 commented 5 years ago

It sounds really cool, but I wish it were in Idea :)

benmccann commented 5 years ago

Haha. I'm still an Eclipse fan, but even then I would have guessed a web app to be the most natural way to build anything UI-related these days, so that you can share with others more easily

But sounds cool. I think it'd be really neat to demo in a video at some point

jtablesaw / tablesaw

Eclipse support... #274