quantopian / zipline

Zipline, a Pythonic Algorithmic Trading Library
https://www.zipline.io
Apache License 2.0
17.52k stars 4.71k forks source link

custom data bundle price wrong #2407

Open ayxemma opened 5 years ago

ayxemma commented 5 years ago

Dear Zipline Maintainers,

Before I tell you about my issue, let me describe my environment:

Environment

* Operating System: Linux 4.4.0-112-generic # 135-Ubuntu SMP Fri Jan 19 11:48:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux * Python Version: Python 3.5.5 :: Anaconda, Inc. * Python Bitness: 64 * How did you install Zipline: pip * Python packages: `$ pip freeze` or `$ conda list` zipline==1.2.0 pandas==0.18.1 pandas-datareader==0.5.0 Now that you know a little about me, let me tell you about the issue I am having: # Description of Issue With CSVDIR ingest i was able to use it smoothly all of the times. but i created a new bundle with two tickers SF and BLUSF, which have completely different price series, but i don't know why when ingest into the same data bundle they always become the same price series that BLUSF has. Here is how you can reproduce this issue on your machine: ## Reproduction Steps 1. create a bundles with just two tickers SF and BLUSF in the bundle (attached is the data i use, converted to xlsx since csv can't be uploaded) 2. run any trading algorithm 3. observe that they have same prices ... ## What steps have you taken to resolve this already? i tried to leave just SF or BLUSF in the bundle separately and data are correct. but when putting them together the data become wrong again. i tried changing the ticker name SF to SF_SF or other names it all works fine. ingested data is correct. ... [SF.xlsx](https://github.com/quantopian/zipline/files/2789528/SF.xlsx) [BLUSF.xlsx](https://github.com/quantopian/zipline/files/2789529/BLUSF.xlsx) Sincerely, ccbttn
freddiev4 commented 5 years ago

Hi @ayxemma I believe this is due to https://github.com/quantopian/zipline/pull/2223

Calculating the name of CSV file from the symbol name is not done corretly. The current method would match a filename to a symbol if it ended in the symbol concatenated with .csv. This means that if there are two stocks Ford (symbol F) and Regions Financial Corporation (symbol RF). Then the calculated filename to load for Ford could be RF.csv which is clearly not the data file for Ford.

If we could get that PR fixed up then we could avoid this issue (if you have any interest in doing so, feel free to give it a go)

ayxemma commented 5 years ago

thanks, i looked at the changes made in the request i think that should be correct, not sure why it shows it's failing some tests. will have a look later.

Mel-Peng commented 5 years ago

Modify the following line in data/bundles/csvdir.py works for me: fname = [fname for fname in files if '%s.csv' % symbol in fname][0] to fname = [fname for fname in files if '%s.csv' % symbol == fname][0]

Not sure anything else needed to be modified.