Open shipmints opened 7 years ago
Corrected snippet to include self...
unix_start = (self.start - datetime.datetime(1970, 1, 1)).total_seconds()
This would resolve the error one gets when trying to pull data pre-1970 (on a Windows machine):
stdt = datetime(1950,1,1)
df = web.DataReader('^GSPC', 'yahoo', stdt)
98 def _get_params(self, symbol):
---> 99 unix_start = int(time.mktime(self.start.timetuple()))
100 day_end = self.end.replace(hour=23, minute=59, second=59)
101 unix_end = int(time.mktime(day_end.timetuple()))
OverflowError: mktime argument out of range
I think this would also work:
unix_start = int(calendar.timegm((self.start.timetuple()))
@shipmints
Be aware that you can't substract timezone aware datetime and timezone naive datetime
So it may fail if self.start
is a timezone aware datetime (because your epoch is datetime.datetime(1970, 1, 1)
which is a timezone naive datetime)
EPOCH should in fact be datetime.datetime(1970, 1, 1, tzinfo=pytz.UTC)
so we may wonder what will happen if self.start
is timezone aware but not UTC
?
What ever change in pandas-datareader code will be... we should have _sanitize_dates
function in mind
https://github.com/pydata/pandas-datareader/blob/331e3895c51326e87e69325e1a14f1e350cb832e/pandas_datareader/_utils.py#L18
Calculating difference between 2 datetime with differents timezone (according DST or not ...) may be a bit tricky (but some Python library can do this)
With Python 3.6 and Pandas 0.20 there is a very convenient method:
pd.Timestamp('1970-01-01', tz='UTC').timestamp()
unfortunately not with Python 2.7 and same Pandas version
A workaround may be to define (probably in _utils.py
) this function
def _timestamp(ts):
"""
Return unix timestamp for Pandas Timestamp ts
See https://github.com/pandas-dev/pandas/issues/17329
"""
try:
return ts.timestamp()
except AttributeError:
return ts.value // 1000 / 1000000.0
With Python 3.6 (same result with Python 2.7)
In [161]: _timestamp(pd.Timestamp("1970-1-1", tz="UTC"))
Out[161]: 0.0
In [162]: _timestamp(pd.Timestamp("1970-1-1", tz="US/Central"))
Out[162]: 21600.0
In [163]: _timestamp(pd.Timestamp("1970-1-1", tz="Europe/Paris"))
Out[163]: -3600.0
but the problem of this code is that it returns a different value when no timezone is given
With Python 3.6 (my system timezone is Europe/Paris)
In [167]: _timestamp(pd.Timestamp("1970-1-1"))
Out[167]: -3600.0
with Python 2.7
In [43]: _timestamp(pd.Timestamp("1970-1-1"))
Out[43]: 0.0
Pinging @chris-b1 @gfyoung
Maybe
def _timestamp(ts):
"""
Return unix timestamp for Timestamp ts
See https://github.com/pandas-dev/pandas/issues/17329
"""
try:
return ts.timestamp()
except AttributeError:
timestamp = ts.value // 1000 / 1000000.0
if ts.tz is None:
timestamp += time.timezone
return timestamp
In [36]: _timestamp(pd.Timestamp("1970-1-1", tz="UTC"))
Out[36]: 0.0
In [37]: _timestamp(pd.Timestamp("1970-1-1", tz="US/Central"))
Out[37]: 21600.0
In [38]: _timestamp(pd.Timestamp("1970-1-1", tz="Europe/Paris"))
Out[38]: -3600.0
In [69]: _timestamp(pd.Timestamp("1970-1-1")) # use local timezone
Out[69]: -3600.0
@jreback Should this be considered a bug? Can't download pre-1970 data from Yahoo even when it is available (on Windows machines).
the patch by @femtotrader looks fine if u want to submit a PR with tests
Yahoo has been deprecated.
I don't think this was ever implemented in yahoo, which is no longer deprecated. I can do a PR if needed. @femtotrader - were you suggesting that that _timestamp()
be called within _sanitize_dates()
on start
and end
and then using @shipmints solution of
unix_start = (start - datetime.datetime(1970, 1, 1)).total_seconds()
@bashtage - thoughts?
To be clear,
pdr.get_data_yahoo('^GSPC', '1965')
raises a mktime argument out of range
exception on Windows but not Mac.
Also, since yahoo only returns daily data, how important is the time zone issue? Seems like tz could safely be ignored.
Doesn't seem like a high priority issue, but if you have some time and want to fix...
Was this improved with https://github.com/pydata/pandas-datareader/pull/703 ?
Apologies as I'm new to git and github. I can try to make a fork/patch/pull request but I have to figure out how. In the meantime, I thought I'd offer the following to the local experts who surely can do this an order of magnitude faster than I can...