pydata / pandas-datareader

Extract data from a wide range of Internet sources into a pandas DataFrame.
https://pydata.github.io/pandas-datareader/stable/index.html
Other
2.94k stars 681 forks source link

cannot reindex from a duplicate axis #433

Closed kevinwkc closed 6 years ago

kevinwkc commented 6 years ago

Thank you for checking, starting from Jan 1, 2018, the following broke:

Code Sample, a copy-pastable example if possible

edt = dt.date.today()
sdt = dt.date(2015,1,1) 
ticker=['VII.TO', 'WSP.TO', 'POT.TO', 'FTS.TO', 'ECA.TO', 'GIL.TO', 'WCN.TO'] 
web.DataReader(ticker, 'yahoo', sdt,edt)

Problem description

[this should explain why the current behaviour is a problem and why the expected output is a better solution.] It should return dataframe with ticker info, it works before Jan 1, 2018 so I think yahoo finance start send duplicate stuff?

Note: We receive a lot of issues on our GitHub tracker, so it is very possible that your issue has been posted before. Please check first before submitting so that we do not have to handle and close duplicates! I checked, no dup.

Note: Many problems can be resolved by simply upgrading pandas to the latest version. Before submitting, please check if that solution works for you. If possible, you may want to check if master addresses this issue, but that is not necessary. I have updated everything.

For documentation-related issues, you can check the latest versions of the docs on master here:

https://pandas-docs.github.io/pandas-docs-travis/

If the issue has not been resolved there, go ahead and file it in the issue tracker.

Expected Output

expect to output dataframe with ticker info

Output of pd.show_versions()

[paste the output of ``pd.show_versions()`` here below this line] INSTALLED VERSIONS ------------------ commit: None python: 2.7.14.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: AMD64 Family 23 Model 1 Stepping 1, AuthenticAMD byteorder: little LC_ALL: None LANG: None LOCALE: None.None pandas: 0.22.0 pytest: 3.2.1 pip: 9.0.1 setuptools: 36.5.0.post20170921 Cython: 0.26.1 numpy: 1.13.3 scipy: 0.19.1 pyarrow: None xarray: None IPython: 5.4.1 sphinx: 1.6.3 patsy: 0.4.1 dateutil: 2.6.1 pytz: 2017.2 blosc: None bottleneck: 1.2.1 tables: 3.4.2 numexpr: 2.6.2 feather: None matplotlib: 2.1.0 openpyxl: 2.4.8 xlrd: 1.1.0 xlwt: 1.3.0 xlsxwriter: 1.0.2 lxml: 4.1.0 bs4: 4.6.0 html5lib: 0.999999999 sqlalchemy: 1.1.13 pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: 0.5.0 in get_historical(ticker, start_date, end_date) 67 return pivoted 68 ''' ---> 69 p = web.DataReader(ticker, 'yahoo', start_date, end_date) 70 c = p['Adj Close'] #p['Close'] # 71 v = p['Volume'] C:\Users\kevin\AppData\Local\conda\conda\envs\py27\lib\site-packages\pandas_datareader\data.pyc in DataReader(name, data_source, start, end, retry_count, pause, session, access_key) 119 adjust_price=False, chunksize=25, 120 retry_count=retry_count, pause=pause, --> 121 session=session).read() 122 123 elif data_source == "yahoo-actions": C:\Users\kevin\AppData\Local\conda\conda\envs\py27\lib\site-packages\pandas_datareader\yahoo\daily.pyc in read(self) 113 """ read one data from specified URL """ 114 try: --> 115 df = super(YahooDailyReader, self).read() 116 if self.ret_index: 117 df['Ret_Index'] = _calc_return_index(df['Adj Close']) C:\Users\kevin\AppData\Local\conda\conda\envs\py27\lib\site-packages\pandas_datareader\base.pyc in read(self) 184 df = self._dl_mult_symbols(self.symbols.index) 185 else: --> 186 df = self._dl_mult_symbols(self.symbols) 187 return df 188 C:\Users\kevin\AppData\Local\conda\conda\envs\py27\lib\site-packages\pandas_datareader\yahoo\daily.pyc in _dl_mult_symbols(self, symbols) 146 for sym in failed: 147 stocks[sym] = df_na --> 148 return Panel(stocks).swapaxes('items', 'minor') 149 except AttributeError: 150 # cannot construct a panel with just 1D nans indicating no data C:\Users\kevin\AppData\Local\conda\conda\envs\py27\lib\site-packages\pandas\core\panel.pyc in __init__(self, data, items, major_axis, minor_axis, copy, dtype) 146 147 self._init_data(data=data, items=items, major_axis=major_axis, --> 148 minor_axis=minor_axis, copy=copy, dtype=dtype) 149 150 def _init_data(self, data, copy, dtype, **kwargs): C:\Users\kevin\AppData\Local\conda\conda\envs\py27\lib\site-packages\pandas\core\panel.pyc in _init_data(self, data, copy, dtype, **kwargs) 171 mgr = data 172 elif isinstance(data, dict): --> 173 mgr = self._init_dict(data, passed_axes, dtype=dtype) 174 copy = False 175 dtype = None C:\Users\kevin\AppData\Local\conda\conda\envs\py27\lib\site-packages\pandas\core\panel.pyc in _init_dict(self, data, axes, dtype) 226 d = raxes_sm.copy() 227 d['copy'] = False --> 228 v = v.reindex(**d) 229 if dtype is not None: 230 v = v.astype(dtype) C:\Users\kevin\AppData\Local\conda\conda\envs\py27\lib\site-packages\pandas\core\frame.pyc in reindex(self, index, columns, **kwargs) 2731 def reindex(self, index=None, columns=None, **kwargs): 2732 return super(DataFrame, self).reindex(index=index, columns=columns, -> 2733 **kwargs) 2734 2735 @Appender(_shared_docs['reindex_axis'] % _shared_doc_kwargs) C:\Users\kevin\AppData\Local\conda\conda\envs\py27\lib\site-packages\pandas\core\generic.pyc in reindex(self, *args, **kwargs) 2513 # perform the reindex on the axes 2514 return self._reindex_axes(axes, level, limit, tolerance, method, -> 2515 fill_value, copy).__finalize__(self) 2516 2517 def _reindex_axes(self, axes, level, limit, tolerance, method, fill_value, C:\Users\kevin\AppData\Local\conda\conda\envs\py27\lib\site-packages\pandas\core\frame.pyc in _reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy) 2677 if index is not None: 2678 frame = frame._reindex_index(index, method, copy, level, -> 2679 fill_value, limit, tolerance) 2680 2681 return frame C:\Users\kevin\AppData\Local\conda\conda\envs\py27\lib\site-packages\pandas\core\frame.pyc in _reindex_index(self, new_index, method, copy, level, fill_value, limit, tolerance) 2688 return self._reindex_with_indexers({0: [new_index, indexer]}, 2689 copy=copy, fill_value=fill_value, -> 2690 allow_dups=False) 2691 2692 def _reindex_columns(self, new_columns, method, copy, level, fill_value=NA, C:\Users\kevin\AppData\Local\conda\conda\envs\py27\lib\site-packages\pandas\core\generic.pyc in _reindex_with_indexers(self, reindexers, fill_value, copy, allow_dups) 2625 fill_value=fill_value, 2626 allow_dups=allow_dups, -> 2627 copy=copy) 2628 2629 if copy and new_data is self._data: C:\Users\kevin\AppData\Local\conda\conda\envs\py27\lib\site-packages\pandas\core\internals.pyc in reindex_indexer(self, new_axis, indexer, axis, fill_value, allow_dups, copy) 3884 # some axes don't allow reindexing with dups 3885 if not allow_dups: -> 3886 self.axes[axis]._can_reindex(indexer) 3887 3888 if axis >= self.ndim: C:\Users\kevin\AppData\Local\conda\conda\envs\py27\lib\site-packages\pandas\core\indexes\base.pyc in _can_reindex(self, indexer) 2834 # trying to reindex on an axis with duplicates 2835 if not self.is_unique and len(indexer): -> 2836 raise ValueError("cannot reindex from a duplicate axis") 2837 2838 def reindex(self, target, method=None, level=None, limit=None, ValueError: cannot reindex from a duplicate axis

THANK YOU

bashtage commented 6 years ago

Yahoo has been deprecated.