Closed yhkee0404 closed 1 month ago
I'm not quite following. You write "Two functions clearsky.lookup_linke_turbidity and irradiance.get_extra_radiation is tz-naive but has been used as if they were tz-aware" but in the clear-sky user guide, the times
input is localized
tus = Location(32.2, -111, 'US/Arizona', 700, 'Tucson')
times = pd.date_range(start='2016-07-01', end='2016-07-04', freq='1min', tz=tus.tz)
Hi, @cwhanse Thank you for your attention.
Yes, that localized times
is a tz-aware input which is not a problem, but when it is given to the two functions they return unexpected results and this is the problem. That's why I said the functions are not tz-aware but has been misused as if they were tz-aware, and we are supposed to fix them to be tz-aware.
For instance, try converting the timezone of the times
input using a common tz-aware function tz_convert
and you can see the difference. This example was copied from the .ipynb file I mentioned above:
tus = Location(32.2, -111, 'US/Arizona', 700, 'Tucson')
times = pd.date_range(start='2016-07-01', end='2016-07-04', freq='1min', tz=tus.tz)
cs = tus.get_clearsky(times) # ineichen with climatology table by default
cs_utc = tus.get_clearsky(times.tz_convert('utc')).tz_convert(tus.tz) # ineichen with climatology table by default
assert cs.index.equals(cs_utc.index)
assert not ((cs < 0).any().any() or (cs_utc < 0).any().any())
print(len(cs))
df = cs.join(cs_utc, how = 'outer', rsuffix = '_utc')[~ cs.fillna(-1).eq(cs_utc.fillna(-1)).all(axis = 'columns')]
print(df.index.hour.unique(), df.index.minute.nunique())
df
4321
Index([17, 18, 19], dtype='int32') 60
ghi dni dhi ghi_utc \
2016-07-01 17:00:00-07:00 460.378093 778.936621 70.469943 460.055525
2016-07-01 17:01:00-07:00 456.733517 777.196738 70.143925 456.411549
2016-07-01 17:02:00-07:00 453.083990 775.435849 69.816605 452.762630
2016-07-01 17:03:00-07:00 449.429616 773.653621 69.487974 449.108869
2016-07-01 17:04:00-07:00 445.770496 771.849713 69.158023 445.450370
... ... ... ... ...
2016-07-03 19:28:00-07:00 0.372376 6.479330 0.286308 0.368759
2016-07-03 19:29:00-07:00 0.232132 4.744294 0.182824 0.229742
2016-07-03 19:30:00-07:00 0.131663 3.388042 0.106134 0.130224
2016-07-03 19:31:00-07:00 0.063133 2.356295 0.052039 0.062400
2016-07-03 19:32:00-07:00 0.019268 1.593766 0.016222 0.019030
dni_utc dhi_utc
2016-07-01 17:00:00-07:00 777.658658 70.787077
2016-07-01 17:01:00-07:00 775.913720 70.460150
2016-07-01 17:02:00-07:00 774.147732 70.131910
2016-07-01 17:03:00-07:00 772.360361 69.802348
2016-07-01 17:04:00-07:00 770.551267 69.471455
... ... ...
2016-07-03 19:28:00-07:00 6.330446 0.284669
2016-07-03 19:29:00-07:00 4.628755 0.181634
2016-07-03 19:30:00-07:00 3.300508 0.105355
2016-07-03 19:31:00-07:00 2.291654 0.051610
2016-07-03 19:32:00-07:00 1.547308 0.016073
[459 rows x 6 columns]
Does that difference conform to an expected behavior? I think cs_utc
is the more accurate result than cs
on the user guide.
On the other hand, you can also see on the user guide another times
input as below which is tz-naive, as well as the tz-aware or localized one you mentioned. Both tz-naive and tz-aware times
inputs are used without distiction on the user guide, but it is highly likely that they have been misused because merely preprocessing the inputs by the well-known tz-aware function tz_convert
makes the functions result another values.
Another example from my analysis above:
# times = pd.date_range(start='2015-01-01', end='2016-01-01', freq='1D')
times_h = pd.date_range(start='2015-01-01', end='2016-01-01', freq='1h')
times_h_kst = pd.date_range(start='2015-01-01', end='2016-01-01', freq='1h').tz_localize('utc').tz_convert('Asia/Seoul')
# sites = [(32, -111, 'Tucson1'), (32.2, -110.9, 'Tucson2'),
# (33.5, -112.1, 'Phoenix'), (35.1, -106.6, 'Albuquerque')]
lat, lon, name = 35.1, -106.6, 'Albuquerque'
# for lat, lon, name in sites:
turbidity_h = pvlib.clearsky.lookup_linke_turbidity(times_h, lat, lon, interp_turbidity=False)
turbidity_h_kst = pvlib.clearsky.lookup_linke_turbidity(times_h_kst, lat, lon, interp_turbidity=False).tz_convert(None)
assert turbidity_h.index.equals(turbidity_h_kst.index)
assert not ((turbidity_h < 0).any() or (turbidity_h_kst < 0).any())
print(len(turbidity_h))
df = pd.concat([turbidity_h, turbidity_h_kst], axis = 'columns', join = 'outer')[~ turbidity_h.fillna(-1).eq(turbidity_h_kst.fillna(-1))]
print(df.index.hour.unique(), df.index.minute.nunique())
df
8761
Index([15, 16, 17, 18, 19, 20, 21, 22, 23], dtype='int32') 1
0 1
2015-01-31 15:00:00 2.05 1.95
2015-01-31 16:00:00 2.05 1.95
2015-01-31 17:00:00 2.05 1.95
2015-01-31 18:00:00 2.05 1.95
2015-01-31 19:00:00 2.05 1.95
... ... ...
2015-12-31 19:00:00 2.25 2.05
2015-12-31 20:00:00 2.25 2.05
2015-12-31 21:00:00 2.25 2.05
2015-12-31 22:00:00 2.25 2.05
2015-12-31 23:00:00 2.25 2.05
[99 rows x 2 columns]
Is this an expected behavior? Isn't turbidity_h
right and turbidity_h_kst
wrong? I think ideally the results should be the same as turbidity_h
, and a different turbidity_h_kst
is a bug, just like #237 mentioned above.
To summarize, the user guide has considered those functions as if tz-aware, where any tz_convert
of the input may not change their results. However, you can easily find it does indeed as in my .ipynb file, which means the functions are tz-naive against our expectations and should be fixed into tz-aware like in #2055.
Sorry for my poor English. I've thoroughly revised my words. Could you please review them again? @cwhanse
@yhkee0404 aha, I think I partly understand what's going on here. The Linke turbidity data distributed with pvlib are gridded (location), monthly values. These values (from an outside source) do not have datetime stamps, only a month. pvlib (somewhat arbitrarily) assigns these values to the middle day of each month so that a daily value can be determined by interpolation. That machinery is not shown to the user; it happens here: https://github.com/pvlib/pvlib-python/blob/d53f97e984bfdd268aa92f8bf482ced0edda0110/pvlib/clearsky.py#L228
The examples you provide are comparing the interpolated turbidity at the same location but localized to different timezones. The interpolation is being done using time.dayofyear
. It looks to me that the dayofyear
function may operating in local time, because the day of year is changed at midnight local, not midnight UTC.
Arguably, pandas had to make a decision how to return a day of year for each location, so I don't think this is a fault in pandas - its behaving as intended (I assume).
So: if we want Linke turbidity at a location to be consistent with equal timestamps, we need to work under the hood in pvlib to fix the interpolation.
import numpy as np
import pandas as pd
from pvlib.location import Location
tus = Location(32.2, -111, 'US/Arizona', 700, 'Tucson')
times = pd.date_range(start='2016-07-01', end='2016-07-04', freq='30min', tz=tus.tz)
times_utc = times.tz_convert('utc')
doy = times.dayofyear
doy_utc = times_utc.dayofyear
td = np.append(0, np.diff(times.dayofyear)) # add a zero so it's the right length
print(times[td > 0])
td_utc = np.append(0, np.diff(times_utc.dayofyear)) # add a zero so it's the right length
print(times_utc[td_utc > 0])
Thank you for your clear explanation! Could I add some details to your example?
clearsky.lookup_linke_turbidity
for each localized datetime depend not only on interp_turbidity=True
and pandas.DatetimeIndex.dayofyear
, but also on pandas.DatetimeIndex.month
, regardless of interpolation. No matter why, the fundamental reason remains the same as you identified: the values are "changed at midnight local, not midnight UTC." Please note the influence of interp_turbidity
and time.month
:irradiance.get_extra_radiation
values results from pandas.DatetimeIndex.dayofyear
, pandas.DatetimeIndex.month
, and pandas.DatetimeIndex.year
. The root cause, again, is the localized definition of midnight. Refer to #2055 for more details. Or, you can start by examining the following code and trace through irradiance._handle_extra_radiation_types
and tools._pandas_to_doy
to identify pandas.DatetimeIndex.dayofyear
, then follow through to solarposition.pyephem_earthsun_distance
which uses ephem.Date assuming UTC, and finally to solarposition.nrel_earthsun_distance
which references pandas.DatetimeIndex.year
and pandas.DatetimeIndex.month
.clearsky.lookup_linke_turbidity
"these values (from an outside source) do not have datetime stamps, only a month." Therefore, synchronizing input datetimes with the timezone distinguishing each day in the h5 file is key to resolving the timezone-related uncertainties in Linke turbidity.Fortunately, we can experimentally confirm that both functions use UTC for dividing days without needing additional references. The following code provides evidence for irradiance.get_extra_radiation
. Only assuming UTC makes the returned values unique
for every day; localizing to any other timezone is awkward because two distinct values exist per each localized day then.
import pandas as pd
from pvlib.location import Location
import pvlib
tus = Location(32.2, -111, 'US/Arizona', 700, 'Tucson')
times = pd.date_range(start='2016-07-01', end='2016-07-04', freq='30min', tz=tus.tz)
times_utc = times.tz_convert('utc')
dni_extra = pvlib.irradiance.get_extra_radiation(times)
dni_extra_utc = pvlib.irradiance.get_extra_radiation(times_utc).tz_convert(tus.tz)
assert dni_extra.index.equals(dni_extra_utc.index)
assert not ((dni_extra < 0).any() or (dni_extra_utc < 0).any())
print(len(dni_extra))
df = pd.concat([dni_extra, dni_extra_utc], axis = 'columns', join = 'outer')[~ dni_extra.fillna(-1).eq(dni_extra_utc.fillna(-1))]
print(df.index.hour.unique(), df.index.minute.nunique())
print(dni_extra_utc.tz_convert('UTC').groupby(by=lambda x: x.day).unique())
df
145
Index([17, 18, 19, 20, 21, 22, 23], dtype='int32') 2
1 [1320.4980145041654]
2 [1320.4715353124743]
3 [1320.4577467900974]
4 [1320.4566508404746]
dtype: object
0 1
2016-07-01 17:00:00-07:00 1320.498015 1320.471535
2016-07-01 17:30:00-07:00 1320.498015 1320.471535
2016-07-01 18:00:00-07:00 1320.498015 1320.471535
2016-07-01 18:30:00-07:00 1320.498015 1320.471535
2016-07-01 19:00:00-07:00 1320.498015 1320.471535
2016-07-01 19:30:00-07:00 1320.498015 1320.471535
2016-07-01 20:00:00-07:00 1320.498015 1320.471535
2016-07-01 20:30:00-07:00 1320.498015 1320.471535
2016-07-01 21:00:00-07:00 1320.498015 1320.471535
2016-07-01 21:30:00-07:00 1320.498015 1320.471535
2016-07-01 22:00:00-07:00 1320.498015 1320.471535
2016-07-01 22:30:00-07:00 1320.498015 1320.471535
2016-07-01 23:00:00-07:00 1320.498015 1320.471535
2016-07-01 23:30:00-07:00 1320.498015 1320.471535
2016-07-02 17:00:00-07:00 1320.471535 1320.457747
2016-07-02 17:30:00-07:00 1320.471535 1320.457747
2016-07-02 18:00:00-07:00 1320.471535 1320.457747
2016-07-02 18:30:00-07:00 1320.471535 1320.457747
2016-07-02 19:00:00-07:00 1320.471535 1320.457747
2016-07-02 19:30:00-07:00 1320.471535 1320.457747
2016-07-02 20:00:00-07:00 1320.471535 1320.457747
2016-07-02 20:30:00-07:00 1320.471535 1320.457747
2016-07-02 21:00:00-07:00 1320.471535 1320.457747
2016-07-02 21:30:00-07:00 1320.471535 1320.457747
2016-07-02 22:00:00-07:00 1320.471535 1320.457747
2016-07-02 22:30:00-07:00 1320.471535 1320.457747
2016-07-02 23:00:00-07:00 1320.471535 1320.457747
2016-07-02 23:30:00-07:00 1320.471535 1320.457747
2016-07-03 17:00:00-07:00 1320.457747 1320.456651
2016-07-03 17:30:00-07:00 1320.457747 1320.456651
2016-07-03 18:00:00-07:00 1320.457747 1320.456651
2016-07-03 18:30:00-07:00 1320.457747 1320.456651
2016-07-03 19:00:00-07:00 1320.457747 1320.456651
2016-07-03 19:30:00-07:00 1320.457747 1320.456651
2016-07-03 20:00:00-07:00 1320.457747 1320.456651
2016-07-03 20:30:00-07:00 1320.457747 1320.456651
2016-07-03 21:00:00-07:00 1320.457747 1320.456651
2016-07-03 21:30:00-07:00 1320.457747 1320.456651
2016-07-03 22:00:00-07:00 1320.457747 1320.456651
2016-07-03 22:30:00-07:00 1320.457747 1320.456651
2016-07-03 23:00:00-07:00 1320.457747 1320.456651
2016-07-03 23:30:00-07:00 1320.457747 1320.456651
Describe the bug Both of the functions
clearsky.lookup_linke_turbidity
andirradiance.get_extra_radiation
are supposed to be tz-aware in accordance with the user guide, but take the tz-aware inputs as tz-naive in their implementations and return incorrect results. Another functions using these two functions may also have been affected by the errors; i.e.Location.get_clearsky
,clearsky.ineichen
, orclearsky.simplified_solis
.To Reproduce Refer to my comprehensive analysis on the .ipynb file simulated in the same way as the user guide. It seems too big for the browser to render because it shows me only this message: An error occurred. In a similar situation, please download the file or clone my repository to see the content.
Expected behavior There should be no items searched on the file by the keyword
rows
, each of which means errors subject to timezones.Screenshots For instance, on the user guide, it prints:
climatological linke_turbidity = 3.329496820286459
. However, after converting the local timezone to utc, it prints on my analysis above another value:climatological linke_turbidity = 3.3247187176482633
.Versions:
pvlib.__version__
: v0.10.5pandas.__version__
:2.2.2
3.11.9
Additional context Almost the same as #237