man-group / arctic

High performance datastore for time series and tick data
https://arctic.readthedocs.io/en/latest/
GNU Lesser General Public License v2.1
3.06k stars 583 forks source link

Future Development? - Java API #101

Closed michaeljohnbennett closed 2 years ago

michaeljohnbennett commented 8 years ago

Hi there,

I have been trialling this out and it looks a great framework for storage retrieval, is there any plans for the Java API or are you looking for contributors to help? I'm testing this out for data storage for zipline/quantopian backtester and also for a JVM based project and was wondering what stage that was at if any?

richardbounds commented 8 years ago

Hi, we do have a Java implementation internally (TickStore reader/writer and VersionStore ndarray reader) that we'd like to open-source, but it needs a bit more work to disentangle it from our internal libraries and build infrastructure.

Are you using TickStore or VersionStore, and what sort of interface would you want to the data, tick-by-tick, or something DataFrame-like? Sorry I can't be more specific on the timing.

michaeljohnbennett commented 8 years ago

Hi Richard,

It would be a version store reader. DataFrame-Like would be fantastic.

I'm just getting to grips with version store and working through the example (done) and now looking at loading in data.

Do you have any best practices for loading in data as:

  1. From some raw data source (I can get Pandas to read CSV files to DataFrames and load them in I suppose, is that the fastest/optimal route?
  2. Appending data daily. As I'm working with EOD data daily or at some frequency I need to append rows onto each symbol in the library, do I need to read it out add the rows on then upsert again or can I just do an update call appending the rows on?

Really looking forward to learning and using this system a bit more.

richardbounds commented 8 years ago

Loading data from csv via Pandas is fine - you might just want to check that Pandas gets the right datatypes for the columns when you read the csv.

For appending new rows, library.append() should do what you want, e.g.: https://github.com/manahl/arctic/blob/master/tests/integration/store/test_pandas_store.py#L369

michaeljohnbennett commented 8 years ago

Thanks @richardbounds thats really helpful, I'll keep on going with the python side of things.

I will be interested to see how you can get Java and Python interfaces to work on the same dataset.

I'll get some prototype code working to load it in, I already have a loader code for CSI Data to dataframe with correct DTypes to a MySQL/SqlAlchemy loader so just need to create a new loader impl for ARCTIC.

michaeljohnbennett commented 8 years ago

Hi @richardbounds that worked a treat, now onto looking at querying the data.

Is there a way in the API (sorry noob question) where I could just return a subset of the data (I know we can do timeseries) but also just certain columns?

Sometimes I need a data frame that might be a number of symbols and I want to get just the close for them to do some analysis like a universe selection/filtering for a TAA system I'm working on, the dataframe store seems really useful but having it stored in binary format doesn't make it easy to look at it or query it in some other way.

Any ideas/suggestions?

richardbounds commented 8 years ago

TickStore stores columns separately so you can query them individually, but our existing VersionStore pandas implementation stores all the columns packed together, so you need to just retrieve them all and slice out the ones you want - for EOD data with a handful of columns it should be only a few milliseconds to retrieve the whole thing anyway.

For systems which work with a single column (e.g. a close price) across hundreds or thousands of symbols at once, we tend to build and store a derived dataframe with just that column for the whole universe together in one symbol, e.g. something like 'US_EQUITY_CLOSE_PRICES'.

lJoublanc commented 8 years ago

I want to share that I'm writing a scala adapter for arctic, and will put it on github when it's a bit more mature. I use it as a tick-logger, and for simulation, so the API is somewhat different from the python one, i.e. it's designed for streaming asynchronously, tick-by-tick, rather than returning a block of data. Currently it's pretty basic: TickStore-only with no support for bitmask or reading particular columns, etc., but it works. I'll post here when it's in a presentable state.

vroomzel commented 8 years ago

It would be great to have TickStore API available in Java to be able to collect tick data from InteractiveBrokers whose API doesn't officially support Python.

lJoublanc commented 5 years ago

Don't know if this helps at all, but I recently published a scala driver that supports TickStore. It has limitations though - the mongodb driver it uses is written from scratch and currently only supports single-server architecture and no authentication.

yschimke commented 5 years ago

@lJoublanc I'll hopefully have time over Christmas to open source the java impl we use internally. But great to see a Scala implementation in the open.

cc @richardbounds

thesmartwon commented 5 years ago

I would love for there to be an open source version that uses the official MongoDB driver! What's the progress @yschimke ?

yschimke commented 5 years ago

@thesmartwon Thanks for the prod, I spoke 2 weeks ago with @jamesblackburn about this work.

It's close to ready for an alpha. It would be generally an extraction of our internal Java version but trimmed down to the core functionality, i.e. to start with we won't include our Kafka to Tickstore functionality. Rather than blocking, I'll get that out in current shape (in next week) and we can iterate on it, I can add you as a collaborator. There are a lot of things to improve in it, e.g. auth and dependencies on an internal SEDA library.

yschimke commented 5 years ago

FWIW if I'm spending any personal time on it, I'll probably focus my own efforts on getting a second more modern kotlin implementation using reactive streams out. But that's a secondary target that will steal liberally from the first. But for me is a lot more interesting.

thesmartwon commented 5 years ago

Awesome! Can I be added as a contributor @yschimke ?

thesmartwon commented 5 years ago

Has there been any progress over the past month @yschimke ?

jamesblackburn commented 5 years ago

Unfortunately the work to open source the Java version of this has been down-prioritised and isn't currently in progress.

jasonlocal commented 2 years ago

legacy issue. Java Version will not be provided in the near future