knowm / XChart

XChart is a light-weight Java library for plotting data.
http://knowm.org/open-source/xchart
Apache License 2.0
1.5k stars 396 forks source link

ggplot2 style API #69

Closed mleoking closed 9 years ago

mleoking commented 10 years ago

ggplot2 (http://ggplot2.org/) in R implements the graph API described in the book: grammar of graphics. It is then able to make complex plots in 2-3 lines of code.

screenshot_1

Have look at this video: https://www.youtube.com/watch?v=HeqHMM4ziXA

You will then know why Its API style is now so popular and why people want it API style so much that they translate its API into python: http://blog.yhathq.com/posts/ggplot-for-python.html.

ggplot2 is still using the basic plot functions in R, but just provides a very nice & concise API to describe figures. And it is the API style that makes it so popular.

This makes think, why we do not like Jfreechart. Jfreechart is not bad and its 2d plotting ability is already relatively comprehensive. I now realize the main reason is not really its size (2m is not that big). It is because its complex API. You need to write so many code to plot a simple figure.

That makes think, whether it is possible to write ggplot2 style API in java that unifies Jfreechart (2d) and jzy3d (3d). It may that the API is so nice and concise that people just use the new API instead of using JFreechart, or Jzy3d directly.

I am not sure whether ggplot2 style API is really possible in Java. But I feel, if only aim for simplicity and forget all those rigid rules & styles in Java, a ggplot2 similar API might be possible.

What do you think?

timmolter commented 10 years ago

One of the reasons I originally created XChart was because I was frustrated with the complexity of JFreeChart, both in using it and in digging through the source code. As a result using XChart is already much simpler to use. I'll have to look at ggplot2 to see exactly how the syntax looks.

I'm planning to start on V3.0.0 with an entire new API anyway soon in preparation for new chart types, and I will look into a ggplot2 type api as well. Feel free to participate!

mleoking commented 10 years ago

Very interesting to know you are working on new version.

I feel XChart now only need two additional chart types to be a reasonably complete chart lib:

1) Density and contour plot. A widely used 2D replacement for 3D plot. I would feel density and contour plot is better than 3D plot in its clarity.

2) Inset plot: scale and plot 1 or several sub-figures inside a main figure (see the figure below). This is different from multiple plots where figures does not overlap with each other. Multiple plots can also be done in Latex or Inkscape easily after plotting each figure separately.

Inset plot has to be done within a charting lib because you have to guarantee only the figure size is re-scaled but not the text inside a subfigure. The inset sub-figure is smaller, but its text size should still be the same with the main figure.

multiplot2

I went cross JavaFX charts: http://docs.oracle.com/javafx/2/charts/chart-overview.htm (here are two vidoes: https://www.youtube.com/watch?v=5uzp-Rr7i58 https://www.youtube.com/watch?v=yaZBcu91b34)

I do not know it at all, and am not sure whether you are interested. It is built-in java and you do not need external dependencies.

For my own project, I have written a set of functions to replace the Java reflection API to change warnings/exceptions to be optional and more. It can enable some dynamics into the traditional Java program. The pull request I sent includes some of these functions. If you are interested in more of those functions I can sent you my whole org.leores.util.* package.

Inside the org.leores.util.* package, there is also an advanced csv reader: DelimitedReader. I created it to read csv files created by my simulations, each of which is like a small database. You can set starting/end tags, columns names to read, valid data style using java regex expression, and more.

I am not sure whether you need some java Reflection to enable a nice API. But if you do, you not need to worry about the performance of Java reflection. Since Java 7, the reflection performance is not a big problem. In my simulation framework which uses Java reflection heavily for loading settings, the cost of reflection is not noticeable and it gives you lots of flexibility.

If you customize the load function I added in the pull request, I feel it is even possible to turn XChart into a standalone plot program with its own scripting language like Gnuplot. The basic functions are already there: get variable values $var$, invoke functions $function(p1,p2)$, and calculate expressions #Sin($var$)+$var2$# . It requires lots of efforts nevertheless.

timmolter commented 10 years ago

@mleoking I'll reply to this tomorrow. Thanks for your patience. :)

timmolter commented 10 years ago

@mleoking Thanks for the detailed references to ggplot2. I watched the video and found the API to be really good. I like the simplicity of the API, especially the way you can have additional arrays with strings in them and it automatically colors or shapes then differently in the plot. It can even split the data into different plots keyed on those strings.

In XChart 3.0, I need to first and foremost get rid of the single Chart class and break it down into BarChart, Histogram, PieChart, etc. Sticking with how Java normally works, I think a really quick and intuitive way to build charts would be with a builder pattern. I already have that in place but it's not the prelavant way. There are 3 ways to build charts, and I think I'll reduce it to only the builder pattern. Building a chart would look like this then:

Chart chart = XChartBuilder.buildLineChart().setXData(blah).setYData(blah).

You could keep stringing on more and more setters, which is kindof like the way it is done in R. What's better though with Java is that there is auto completion, which guides a person in self discovering what things can be set. In R, you don't have that. It also prevents runtime errors.

So with one line of code, you could have a plot. Then you could dig into customizations using a similar mechanism like:

chart.style.showLegend(false).setBackgroundColor(blah)...

Borrowing from ggplot2, I could add 'factors'. For example:

Chart chart = XChartBuilder.buildLineChart().setXData(blah).setYData(blah).setFactor(blah);

Then the XChart code could, in the background, create the separate series based on the factors provided, etc. Like ggplot2, we could provide a builder method like:

Chart chart = XChartBuilder.buildLineChart().setXData(blah).setYData(blah).setFactor(blah).setFacet(blah);

I'll follow up next with more comments, particularly on CSV files.

timmolter commented 10 years ago

@mleoking Now regarding CSV data. I don't come from the background of gnuplot where (I think) people use it to visualize data from CSV files. Most of my plotting needs come from Java programs I write myself and I manipulate the data to be stored in Lists. Then I want to quickly get a plot from the data. I have recently however added a CSV importer and exporter as the need has come up.

Perhaps a useful feature would be a way to create charts directly from csv files, rather than using a CSVImporter. Your CSV tools you've mentioned could be useful here perhaps. Given a CSV file, and the builder API I mentioned previously, it would then be really easy to quickly view/explore the data in a number of ways.

Even more recently, I needed to take a collection of POJOS/Beans/DTOs and make an array of plots of histograms for each class member/field. I used reflection for this and I was very pleased with the results. It would be neat to be able to give XChart a Collection of Objects, and have it make different plots easily. How were you thinking about using reflection exactly?

What remains to discuss is a stand alone plot program and making a script-like ggplot2 interface. For me personally, the other things I mentioned would have to be in place first as priority. The ggplot2 scripting would be possible by just creating a class that parses a ggplot2 file and builds an XChart from it. The standalone program would be a nice to have feature. It could perhaps plot different functions and let you import CSV files. There would be working in creating builds for deploying the app to Windows, OSX and Linux.

I'm looking forward to your response!

mleoking commented 10 years ago

@timmolter The new API looks good.

ggplot2 assume its data is a dataframe: data rows with column titles. In the pull request, I added a class DataTable which I used in my JGunplot. It is just a DataFrame.

My CSV reader DelimitedReader is also in the pull request now. There is a DelimitedReaderDemo class showing how to use it. There are not many comments in my code. But I feel you can understand it. You can store the data for many figures in one CSV file, and you can then use DelimitedReader to explore the CSV file like a database and plot the figures out one by one. This is what I do for my simulation results. I can plot loads of figures from one CSV file. You can do more than the demo shows you actually. E.g. One CSV file can contain several data tables and there could be columns (for example, "id") linking the row in these tables. DelimitedReader can help you handle it.

Another interesting class included in the pull request is Logger. I use it for the parent class of all my classes. Logger can let you log everything in detail. Objects will be automatically analyzed through reflection to show their fields and values in the log. It is very useful when you debug or just lazy to write a toString function to output a particular field value. You can see the code to see what it does.

In addition, Logger is designed to enable you:

For each instances of classes that inherit Logger:

1) Set its output/record log level 2) Set a Processable2<String, Integer, String> logProcessor to process all its log. You can also decide whether output the log after processing.

In addition, you can do the same thing for the singleton tLogger in Logger. But this time the operation will affect all logs of all instances.

As I said, you do not need to merge the pull request, you can just take what you want from it, if there is any. :+1:

mleoking commented 10 years ago

These classes and JGunplot are all utility classes I created for my another project.

It is a parallel scientific task/simulation framework.

It is designed to help you automatically & parallelly calculate the results of a model/simulation/task with a range of values for a number of parameters. It helps you explore the parameter value space parallelly and then aggregates the results.

It has many interesting features. E.g. It can do checkpoint. This will enable you to continue running a task when it is accidentally stopped (e.g. the power down of a remote server running your code)

It also enables you to delete those book keeping code (e.g. data records stored in Lists) from the core logic. You can use a xml file to describe those book keeping requirements and the framework will automatically & parallelly do this for you through reflection. This is relevant to plotting because each book keeping can be automatically translated into a graph in my framework:

E.g. You can configure (in xml) to record the average z for a (x,y) pair for a set of simulations in which x and y are two parameters and z is the output. Then after finishing those simulations, the results will be both saved and plotted automatically.

Using this mechanism, the plotting and simulations/calculations are integrated. Each graph = a line of book keeping configuration in a xml file. You can configure many book keeping (=graph) in the xml file and do not need to write code for book keeping and plotting. And your Java code only contains the core logic of your model/simulation etc.

I have seriously compared XChart and Gnuplot to decide which one will be the plot engine for the book keeping results of my framework. You might still remember the pic I sent you:

68747470733a2f2f662e636c6f75642e6769746875622e636f6d2f6173736574732f353736333735312f323437343134302f36613838393364632d623034382d313165332d386639662d3334346336303433383332352e706e67

I tried to use XChart to do the job and it did well and generally response faster than Gnuplot.

But I finally decide to choose Gnuplot because 1) it has contour/density plot and good PDF output, 2) it has its own script language and can mix data and scripts in one file.

Gnuplot's own script language enables me to save the plot together with data in one text file. When you double-click the file, it will be rendered as a figure. When you open the file using a text editor, you can edit the scripts (plot style, axis labels etc.) and data. This feature attracted me so much that I write JavaGnuplotHybrid for it. :)

Even though I like the script of Gnuplot. But I also agree that own script language is not a priority for XChart which emphasized on light weight.

I just feel sad sometimes, as a Java developer, I have to depend on plot libs from other languages. And I really wish XChart can grow up and eventually can replace gnuplot for Java. :+1:

timmolter commented 10 years ago

Thanks for your code and thoughtful discussion. When I have time, I'm going to start on the V3.0 API , implement more chart types, and make it easy to create plots from CSV files and arrays/Lists of POJOs. I'll be creating issues for each step, so if you want to follow along and comment just keep an eye on the issues.

timmolter commented 9 years ago

The project: Joinery looks like something that could be used as a front end ggplot-like data manager for XChart.

lwhite1 commented 8 years ago

Joinery looks very nice (hadn't seen it before) but can you get a non GPL license for it?

lwhite1 commented 8 years ago

FWIW, (shameless plug) you may want to take a look at Tablesaw: github.com/lwhite1/tablesaw. It's not as polished as joinery but has an Apache license and is under very active development.

timmolter commented 8 years ago

Tablesaw looks like a clear winner!

lwhite1 commented 8 years ago

Thanks Tim,

I've added plotting to Tablesaw so you can create plots directly from Tables and Columns without extracting double[] or whatever. Basically, I'm using XChart for the most important chart types (scatter, line, and their derivatives) and filling in around it with other libraries where plots are currently missing: (Bar charts from javafx because they support horizontal bars, histograms from the smile plot library.). I will happily replace those as XChart adds new types, since I can't make the differently libraries look and work similarly enough - plus it's a bunch of work since the APIs are all different :)

Thanks for all your work with XChart. I really enjoy using it.

timmolter commented 8 years ago

Very cool! XChart has histograms. See BarChart06. If that doesn't work for me, let me know why, and I can perhaps change things. Maybe I'll add the horizontal bar charts too.

lwhite1 commented 8 years ago

Hi Tim, Thanks for the encouragement.

Regarding the histogram, really looking for something with a numeric x axis. I opened a feature request (#157) on that a couple weeks ago. Will definitely look to Xchart first as we add new chart types and upgrade as Xchart enhancements become available.

tinylamb commented 6 years ago

if the horizontal bar charts support now?