quantopian / zipline

Zipline, a Pythonic Algorithmic Trading Library
https://www.zipline.io
Apache License 2.0
17.68k stars 4.73k forks source link

Futures being read as Equities from env #683

Closed warren-oneill closed 9 years ago

warren-oneill commented 9 years ago

code snippet:

class TestAssetFinder(TestCase):
    @with_environment()
    def setUp(self, env=None):
        self.identifier = 'f'
        metadata={0: {'asset_type': 'future', 'symbol':
            self.identifier}}
        self.asset_finder = AssetFinder(metadata=metadata)
        env.update_asset_finder(identifiers=[self.identifier])

    @with_environment()
    def test_future_metadata(self, env=None):
        asset = env.asset_finder.lookup_generic(self.identifier,
                                                   datetime.now())
        self.assertIsInstance(asset, Future)

output

AssertionError: (Equity(0, symbol='f', asset_name='', exchange='', start_date=0, end_date=Timestamp('2015-08-20 00:00:00+0000', tz='UTC'), first_traded=None), []) is not an instance of <class 'zipline.assets._assets.Future'>

cc @jfkirk @StewartDouglas @yankees714 @dhexus

ssanderson commented 9 years ago

@warren-oneill I think the issue is that you're inserting your metadata into an AssetFinder, but not actually setting that AssetFinder on your TradingEnvironment. I think rather than passing identifiers, you want to pass asset_finder, ala

env = TradingEnvironment.instance()
metadata = {0: {'asset_type': 'future', 'symbol': 'f'}}
env.update_asset_finder(asset_finder=AssetFinder(metadata=metadata))
warren-oneill commented 9 years ago

I changed the updating of asset finder to

env.update_asset_finder(asset_finder=self.asset_finder)

and now i get

File "zipline/zipline/assets/assets.py", line 590, in lookup_generic
    raise SymbolNotFound(symbol=asset_convertible_or_iterable)
zipline.errors.SymbolNotFound: Symbol 'f' was not found.
warren-oneill commented 9 years ago

The only thing that seems to work is

asset = env.asset_finder.lookup_generic(0, datetime.now())

I also tried

asset = env.asset_finder.lookup_symbol('f', datetime.now())

but that returns a None.

So it seems it is only possible to do a lookup by sid which I generally don't want to do since lookup by identifier is much more convenient.

I also tried using lookup_generic('f', datetime.now()) with an Equity instead of a Future and it worked fine so it seems to be a Futures-specific problem.

warren-oneill commented 9 years ago

I think I've found the problems:

  1. the Sql queries in lookup_symbol_resolve_multiple do not query the futures table
  2. after the queries only _retrieve_equity is used

I've started working on a PR and will hopefully have it up before the end of the day.

yankees714 commented 9 years ago

@warren-oneill I believe it's intentional that futures are left out of the symbol lookups, @ehebert made these changes and should be able to explain.

warren-oneill commented 9 years ago

@yankees714 @ehebert ok would love to hear more. Looking up futures by symbol is something I currently do quite often.

ehebert commented 9 years ago

@warren-oneill, first, apologies that this broke your existing workflow.

When changing over the asset finder to be backed by SQLite (for start up time reasons), we did cut out the future lookup by symbol since it we weren't fully supporting it on our system yet; however, that doesn't mean that ability is not desired.

I think the biggest question to answer is this (and IIRC not having the answer then was the reason it was cut), should future symbol lookup be implemented in one of these two ways (or another?).

1) Future lookup by symbol uses lookup_symbol_resolve_multiple (i.e. the symbol API method) and do an additional attempt to select from the equities table and then the futures table? (But what about name collision, if there are any?)

2) Should there be there be a new function called future_symbol or future_chain, dedicated to returning a future asset class.

I would lean towards 2), if looking up by root symbol for the chain is desired since that logic may be distinct enough from equity lookup, thoughts @jfkirk, @ssanderson, @warren-oneill ?

ehebert commented 9 years ago

Update to above question, @yankees714 already added lookup_future_chain https://github.com/quantopian/zipline/blob/master/zipline/assets/assets.py#L436 so that piece is unlocked.

ehebert commented 9 years ago

@warren-oneill, I just talked it over with @yankees714 and @jfkirk

And the current thinking is that we should have a future_symbol API method. That could either be powered by lookup_symbol_resolve_multiple with a parameter that determines whether an equity or a future is used, or a lookup_future_symbol because the "resolve multiple" aspect does not apply.

ssanderson commented 9 years ago

@ehebert is the proposal that we have future_symbol in addition to future_chain? When would you want to use one over the other?

warren-oneill commented 9 years ago

future_chain uses the root_symbol and returns the sids for all contracts under that root_symbol. Whereas future_symbol would take a specific product, eg first week of May, and return the sid for this product.

warren-oneill commented 9 years ago

Personally I don't see whats wrong with having one lookup function. It would come in handy if you are looping over both equities and futures. But tbh either way is find with me :smile:

ssanderson commented 9 years ago

Personally I don't see whats wrong with having one lookup function. It would come in handy if you are looping over both equities and futures.

I think the concern here is performance. If there's one function that does both kinds of lookups, then that function incurs the overhead of figuring out which type of thing you're asking for every time you call it.

There's also the possibility of collisions. If some symbol is used as both a future and an equity, how do we know which one to return. One possibility is to barf in that case, but that means that every call into the AssetFinder has to make two queries now.

Another option would be to add an optional type kwarg to symbol, which would default to 'equity', with other valid values being 'future' and possibly 'auto', which would incur the overhead in exchange for figuring out what you mean.

ehebert commented 9 years ago

@ssanderson @warren-oneill articulated what I meant with https://github.com/quantopian/zipline/issues/683#issuecomment-134606177

And I agree with @ssanderson about the performance concerns, as well as the long term maintainability, inferring intent from a provided value (as opposed to being explicit with the 'equity/future' that @ssanderson mentioned) is often prone to bugs (or at least I've had my fair share of bugs when trying to do so) and harder to refactor later, since both cases need to be accounted for etc.

warren-oneill commented 9 years ago

Ok sounds good. Who's gonna write it? :stuck_out_tongue_winking_eye: I've already implemented a combined futures/equities lookup_symbol_resolve_multiple + test so it shouldn't be much work to seperate the futures stuff into a lookup_futures function. On the other hand you guys seem to have a better idea of which direction you want things to go...

ssanderson commented 9 years ago

This probably falls in @jfkirk or @yankees714 queue. Do you guys have a sense of when you'd be able to work on this? I'm doing user interviews on the modelling API this week, so I may have a free hour or two to work on this.

For clarity, I think it might be worth writing up a clear proposal for the algorithm-facing API implied here.

I think what I'm proposing is that calling symbol('CL-{somedate}', type='future') is equivalent to future_chain('CL', as_of=somedate)[0].

I suppose the question is whether the former is enough cleaner/clearer that it's worth creating two ways to do this.

Thoughts @jfkirk, @yankees714, @warren-oneill ?

cmorgan commented 9 years ago

@warren-oneill do you have the lookup_symbol_resolve_multiple + test in a branch anywhere?

warren-oneill commented 9 years ago

hey @cmorgan it lives here https://github.com/grundgruen/zipline/tree/future-lookup. Unfortunately my additions broke one of the other tests and I haven't got around to fixing it yet.

warren-oneill commented 9 years ago

ok now all the tests should be green.

warren-oneill commented 9 years ago

@ssanderson currently my focus is on using a future lookup for more "under the hood" type stuff, e.g. accessing sids to generate events in DataSource. In my case I am reading the identifiers, prices and volumes from my databank and I need to connect this with an sid I can add to the event. Before I used sid = trading.environment.asset_finder.retrieve_asset_by_identifier(identifier).sid but that function doesn't exist anymore and currently there is no alternative (lookup_symbol did work for a while but not anymore). I haven't given too much though to what I need in the algo but perhaps future_chain is enough.

warren-oneill commented 9 years ago

@ssanderson @jfkirk @yankees714 any updates?

ssanderson commented 9 years ago

@warren-oneill there's been quite a lot of churn on master in the AssetFinder since I last looked at this issue. Looking at the current state of things, it seems like asset_finder.lookup_future_chain(symbol, pd.NaT, pd.NaT) should do what you want?

The one catch there may be that lookup_future_chain does lookups by root symbol, returning a list of all contracts associated with a that symbol, so you'd have to filter down to the particular contract you actually need. Does that fit what you're looking for, or are you really after something like lookup_future_contract which looks up an individual contract?

warren-oneill commented 9 years ago

@ssanderson I need something like lookup_future_contract. I need it to match the sids between my metadata and my DataSource. At the moment I wrote my own function which is basically lookup_symbol for futures.

jfkirk commented 9 years ago

A parallel lookup_future_symbol method makes sense to me, too. This would also be fitting when we add indicator assets as their symbols may overlap.

@StewartDouglas @yankees714 Does implementing a lookup_future_symbol method, and creating a future_symbol API method, make sense to you two?

StewartDouglas commented 9 years ago

@jfkirk they both some sensible to me.

warren-oneill commented 9 years ago

@jfkirk @StewartDouglas nice, I can't really live without this. Should I make a PR or leave it up to you guys?

StewartDouglas commented 9 years ago

@warren-oneill I'll have a crack at this and you can leave comments in the PR I open for it.

warren-oneill commented 9 years ago

@StewartDouglas :+1:

jfkirk commented 9 years ago

@warren-oneill Are there outstanding problems with this issue? If so, we can investigate. Otherwise, let's close it.

warren-oneill commented 9 years ago

closed with #743