Closed michaelwills closed 10 years ago
Thanks for the request @michaelwills
In Quantopian, you are implementing inside a subclass of TradingAlgorithm. Our subclass interfaces with the server environment to manage security and resources. You should be able make a pretty simple class that calls methods with the same signature as the Quantopian methods. However, you wouldn't have any of the "magic" functions like set_slippage
or (more importantly) order
. This is incomplete, but maybe you could make a small emulator along these lines:
# save your code in a module, such as my_mod
import my_mod
from zipline.utils import ndict
class QuantopianEmulator(TradingAlgorithm):
def initialize(self):
self.context = ndict()
my_mod.initialize(self.context)
def handle_data(self, data):
my_mod.handle_data(data, self.context)
I'd love to know what you think about the reverse, where we make it easy to get your data into the Quantopian environment? What could we do that would make it easiest for you to have all the data you want accessible in the Quanto environment? Dropbox integration? S3? File upload? Http(s) datasource?
Thanks for all these ideas, it is very exciting to discuss zipline with you.
@fawce, the reverse case you mention is a sweet spot for taqtiqa.com.
Ideally users should be able to 'byo data'.
This would suggest a 3-legged oauth 'dance'. Correct?
Once Quantopian has been granted access to the data there are two issues:
Is there a reference re 1.)
@taqtiqa-mark Taqtiqa is nice. Pricy for individual traders. :) The oauth dance would only be needed for full 3rd party services. As an individual trader I don't want to have to set up a service to provide access to the data I already have available. For example I have plentiful OHLC data and also tick for the instruments, which is actually what I'd prefer using. But it's a different league than the equities your group provides. I hope one day to require your services. :)
zipline is actually very simple in its requirements:
In fact all of my indicator data is in the same CSV and accessible in the strategy. For example the CSV files are simply, for me:
Date,Time,Open,High,Low,Close,indicator_1,indicator_2,…
04-16-2012,18:05,1.53473,1.53485,1.53462,1.53473,1.53110112,1.53053888,…
Then in python I do the following (not clean and not efficient but hacked it up and it works a treat):
data_file = 'data-{sym}-{per}.csv'.format(sym=symbol,per=period)
loaded_data = read_csv(data_file)
loaded_data['dt'] = None
for i in loaded_data.index:
loaded_data['dt'][i] = datetime.strptime(loaded_data['Date'][i] + " " + loaded_data['Time'][i] + ":00", '%m-%d-%Y %H:%M:%S')
del loaded_data['Date']
del loaded_data['Time']
loaded_data['Date'] = loaded_data['dt']
loaded_data.index = loaded_data['Date']
del loaded_data['Date']
loaded_data.index = tseries.index.DatetimeIndex(data=loaded_data.index).tz_localize('US/Eastern').tz_convert('UTC')
# mapping a column for the symbol to contain the close data
loaded_data[symbol] = loaded_data.Close
loaded_data.save('data.dat')
This allows essentially anything to be backtested via zipline.
@fawce I would actually prefer the ability to upload data or have it available. For the tech inclined like myself I don't mind setting up my own https, sftp, or other secure protocol to get data into Quantopian. My problem is the speed, flexibility, and availability of my hardware to do tests locally. In fact I need get 1GB+ of tick data in to backtest. I am aware pandas has a branch to be merged by v0.10 that uses a new parser engine which will be a boon. But that's still 1GB to load and parse before testing begins since everything is in-memory. That's one instrument over a few months. How would that kind of data fare on Quantopian?
A separate feature request would be for zipline to be able to take an iterator as a data source, perhaps a database connection and a query, and it would move the cursor through the records, or even some sort of streamed data. Essentially just an interface made available to be able to stream data in. S3 would work for this as well especially for data that doesn't change much.
For the less technical but security conscious, Dropbox would be good if they have the space for it. It'd be fine for light data loads. Large stuff would be problematic. I don't use it much myself though. If file uploads support batch upload and resume then that'd be fine.
Oh and @fawce it's exciting to be able to discuss it! I've done work with TickZoom back when it was first released, Tradelink which is a very nice C# platform, and NinjaTrader. But I need something cross platform. So I use Metatrader via wine on OS X (works a treat!) and now zipline. A few years back my first trading code was actually Python and COM objects on Windows. If there is a way to stream data into zipline, then I can see it would be easy to have zipline do live trading as well. Just need to make one last adapter for brokers…
@michaelwills thanks for the guidance. You will be interested to know that zipline's internals are all generators. In fact, when you pass a dataframe into zipline as a datasource, we actually wrap it in a generator for the simulation.
I have a little prototype CSV datasource, which streams records from the csv file as they are read (rather than loading all at once). It isn't ready to be in the codebase yet, so I shared it here as a gist: https://gist.github.com/4057021 I'd be thrilled to see a PR that cleans it up and adds it to zipline. Double points for a source that wraps a CSV file from an S3 bucket in the same fashion.
Regarding the data, 1Gig isn't a problem. Quantopian uses the generator style datasources for everything. The bottleneck would still be IO to stream the data, but the load would not be frontloaded as it is the dataframe source.
@taqtiqa-mark if you were to provide a python library that provided a datasource like the csv one, zipline users could try it, and we could include the library on the Quantopian platform. Ideally, you'd handle authentication independently, that way zipline users could try your service.
@fawce oh… oh my. Will review. Hmm with (python-requests-aws)[https://github.com/tax/python-requests-aws] it gets even easier to integrate with S3. I hadn't seen requests
before zipline
so that was an additional nice find. :)
I saw generators in the zipline source but hadn't checked too deeply in the source aside from the exceptions I experienced. Learning as I go. I'm not sure if I can make it any cleaner but I will definitely try it out. Thanks!
@michaelwills, agreed, data can never be too cheap. Likewise, one day we hope to provide you equities TAQ data describing what actually happened, and the infrastructure to be able to access it, e.g. just USA equities tick data is in the order of 11 TB. If you do see USA equities tick-data offered for less than double the price we offer we'd love to hear about it - seriously, please open a 'pricing/TAQ' ticket.
You mention compatibility with Quatopian's implementation, I assumed that meant 3rd-party access, that may have been a hasty assumption (more below).
Basic-Auth over SSL and/or 2-legged OAuth are possible if Quantopian.com is not involved in any way.
@michaelwills, do you have strong reasons for a preference between those two?
Thanks for the example. Your pattern of use is what we had in mind - embarrassingly parallel access patterns. Data could be requested as un/compressed-streaming or a monolithic chunk.
@fawce, would the data request be made+consumed by Quantopian, or the user? That is does the Quantopian systems get to see/access/process the actual data?
This determines the authorization dance required.
It sounds like Quantopian side of things may not be involved at all in data requests, nor have access to the data - correct?
Is this likely to remain the case under current plans?
@taqtiqa-mark I'm not yet in the market for that kind of data but if I see it I'll do so. The compatibility is for the code, though, as opposed to data. I'd love to be able to build algorithms locally where I have step level debug and then push up to quantopian to run there after doing short runs locally. And of course the converse would be beneficial: clone an algo, bring it down, develop locally for debugging, copy back up for testing.
The data I currently have access to though is provided by my broker so it's already in my environment. If I can pull that into Quantopian it'll only be through my own infrastructure. Depending on the security of how that data is accessed (and I would assume it would be high security) I wouldn't mind having even basic auth over ssl. I could imagine times where I'd just do a reverse tunnel to my own box from one of my servers so I could pull data direct from my workstation if needed. It's a matter of flexibility.
When dealing with 3rd party data I'm all for OAuth though.
Catching up on the thread @michaelwills, but as an aside, checking out why the Google group isn't functioning.
@michaelwills Google group should now be joinable and postable.
(I had previously neglected to click 'Save' after opening opening up the groups permissions.)
Let me know if you still can't access it, and I'll investigate further.
Thanks much @ehebert it's open now.
Now that the forum is open I'll post questions there. :)
An update here, https://github.com/quantopian/zipline/commit/b69590a2f709c70dd14d817d1a6bee0b1bb0e7b0 has started the path towards better compatibility between Zipline and quantopian.com
Should be addressed by https://github.com/quantopian/zipline/pull/279. Also see http://blog.quantopian.com/unifying-zipline-quantopian/
Are there plans to make this release of zipline compatible with app.quantopian.com's implementation? It'd be awesome to test and build strategies from the site against unsupported instruments like forex. Or even the ability to backtest real estate housing market data or Intrade quotes. Not to mention strategy development in a comfortable environment. Bringing the extra data available in
data
insidehandle_data()
would be welcome I'm sure.Thoughts?
I'd post this on the zipline Google group but it isn't open for business just yet.