jimmoores / quandl4j

Java wrapper for Quandl REST API
quandl4j.org
Apache License 2.0
78 stars 22 forks source link

Varying return types (TabularResult, Tablesaw Table, String, etc.) #31

Closed benmccann closed 7 years ago

benmccann commented 7 years ago

@jimmoores I was curious if you've seen tablesaw. It's basically an alternative to TabularResult that's much more powerful.

I'd like to fetch data and load it into a dataframe. There are a few options. I could add methods to spit out CSV so that I can then load the CSV into tablesaw myself, we could create a quand4j-tablesaw module, or if you'd be interested we could replace TabularResult with a tablesaw's Table and make it a core dependency.

jimmoores commented 7 years ago

Looks very interesting, I think it's an excellent idea. I'm keen to know what you think about the possibility of making the data representation pluggable/generified?

There's a couple of reasons for that: 1) I'd like to break backwards compatibility as little as possible and preferably not at all. 2) I've got my own time series library that I'm using for various things ( https://github.com/McLeodMoores/starling/tree/mcleodmoores/projects/time-series) like my Java/excel project (https://github.com/McLeodMoores/xl4j)

So was thinking about having static factory methods to create specialisations for each backing representation (clearly you don't want to be dragging the types around like QuandlSession<Table, Json> so you'd probably want predefined interfaces). That could also allow different representations for metadata/json rather than the funny Jackson binding I'm using at the moment.

I'd be interested on your thoughts though - it might be overcomplicating things...

Incidentally, if we're going to make a change like this, we might as well move to Java 8 at the same time.

Jim

On 11 Aug 2017 06:12, "Ben McCann" notifications@github.com wrote:

@jimmoores https://github.com/jimmoores I was curious if you've seen tablesaw https://github.com/jtablesaw/tablesaw. It's basically an alternative to TabularResult that's much more powerful.

I'd like to fetch data and load it into a tablesaw table. There are basically two options. I can either add methods to spit out CSV so that I can then load the CSV into tablesaw. Or if you'd be interested, I could replace TabularResult with a tablesaw Table.

Let me know your thoughts and I'll put together a PR

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jimmoores/quandl4j/issues/31, or mute the thread https://github.com/notifications/unsubscribe-auth/AADG-5XZrYNHCH0DP9n_xyYchDetyW_Fks5sW-KhgaJpZM4O0QM2 .

jimmoores commented 7 years ago

Perhaps I should knock something together that might form the basis of it and you could do the implementation for tablesaw?

benmccann commented 7 years ago

If we returned the CSV as a String, fetching into a tablesaw dataframe would look like:

String code = "WIKI/AAPL";
String csv = session.getDataSetAsCsv(DataSetRequest.Builder.of(code).build());
Table table = Table.createFromReader(new StringReader(csv), code);

If quandl4j used tablesaw directly it would be a bit simpler:

Table table = session.getDataSet(DataSetRequest.Builder.of("WIKI/AAPL").build());

In Python it's pretty clear that Pandas is the defacto table library. Java doesn't have quite a clear a leader in this space, but tablesaw seems to be the best one I could find for the purpose. I've started contributing to that library as well to help make it better fit that purpose for Java.

There's a couple ways we could add support for things like tablesaw and your time-series implementation. One of the easiest might be to just add them as <optional>true</optional> in the pom.xml. That way someone can use quandl4j without having to pull in dependencies they don't need.

jimmoores commented 7 years ago

Of course there are reasons it makes at least some sense to do at least sampling on the server side (if not really transform), in that you might be able to reduce the payload size, but yeah, it looks a bit weird from an API perspective.

Your suggestion of just adding a method to return the string is achingly tempting, but I think I'd prefer something more orthogonal. I'm worried if we put it in now it'll be difficult to generify the interface later and preserve backwards compatibility.

What I had in mind was a generic QuandlSession<METADATA_TYPE, TABLE_TYPE> interface with static factory methods implementing the current factory methods that return a concrete implementation of QuandlSession<MetaDataResult, TabularResult>. New factory methods can then return a QuandlSession<String,String> or QuandlSession<JsonObject, Table>. These specialisations might just be implemented by passing a couple of lambdas into the constructor/factory method of a common concrete class.

I'll try and find time to put something together tomorrow.

Sent from my Samsung SM-G950F using FastHub

jimmoores commented 7 years ago

I've just pushed up a stab at it on topic/generic. It's pretty rough, and the generics got a bit carried away, there's a ton of generic this and abstract that, but I think it should work. It satisfies several criteria:

I backed off going Java 8 at the last minute because I wanted to use it another project that might need to keep Java 7 backwards compatibility.

Now what I want to do is create some sub-projects for the different implementations. I prefer that to optional packages if possible, even if it is a bit of a sledgehammer because beginner users can get really confused when a dependency doesn't get pulled in.

benmccann commented 7 years ago

Cool. That change looks good to me. I left a few minor comments on the commit, but the high-level idea makes sense to me. I like it!

It'd be fine with me to drop Java 7 support. All the other open source stuff I use moved to Java 8 a long time ago, so all my projects have to be on Java 8.

jimmoores commented 7 years ago

Thanks for your comments and apologies for the temporary silence. I'll try and get to finishing this up this weekend. That non javadoc comments thing is an eclipse stupidity. I'll update here shortly.

Sent from my Samsung SM-G950F using FastHub

jimmoores commented 7 years ago

I've pushed up another tranche of work towards 2.0. Modularization is done now. Probably lots of rough edges. All completely untested outside existing tests. Legacy QuandlSession could use the new framework (this was the intention, maybe the original refactor got lost), but currently replicates some methods I think. Documentation needs changing. I'm on vacation so my commits will dip in and out, feel free to chip in if you have time.

benmccann commented 7 years ago

Thanks so much for all of your help with this @jimmoores !!