pvlib / pvanalytics

Quality control, filtering, feature labeling, and other tools for working with data from photovoltaic energy systems.
https://pvanalytics.readthedocs.io
MIT License
94 stars 31 forks source link

Infering timezone from irradiance time-series #183

Open AdamRJensen opened 1 year ago

AdamRJensen commented 1 year ago

pvanalytics has some algorithms for inferring the orientation of a pv plant. In order to use these algorithms, it is crucial to have localized the time series to the correct timezone. However, the time zone of such time series are not always known.

Is anyone aware of algorithms that can infer timezone from irradiance or PV power generation time series?

Perhaps the pvanalytics daytime masking function can be of use for the first step.

AdamRJensen commented 1 year ago

A crude approach could be to simply shift the solar elevation time series by 30-minute increments and see which shift results in the lowest mean "nighttime" irradiance:

import pandas as pd
import pvlib
import numpy as np

data, meta = pvlib.iotools.get_bsrn(
    station='CAB',
    username='redacted',
    password='redacted',
    start=pd.Timestamp(2020, 1, 1),
    end=pd.Timestamp(2020, 12, 31))

# Remove timezone information
data = data.tz_convert(None)
# Calculate solar position
solpos = pvlib.solarposition.get_solarposition(data.index, meta['latitude'], meta['longitude'])

data['nighttime_solarposition'] = solpos['apparent_elevation'] <= 0

# Limit to 30 minute shifts
df_shift = pd.Series(index=np.arange(0, 24*60, 30))

for shift in df_shift.index:
    data['nighttime_shift'] = data['nighttime_solarposition'].shift(periods=shift, freq='min')
    avg_irradiance = data.loc[data['nighttime_shift'].fillna(False), 'ghi'].mean()
    df_shift[shift] = avg_irradiance

print(f"UTC time-zone offset: {df_shift.argmin()}")
kperrynrel commented 1 year ago

@AdamRJensen I do have a method in the fleets QA that is partially adapted in the pvanalytics.quality.time module (shifts_ruptures function) that looks for time shifts in the data (including DST and time drift). The pvanalytics adaption only looks for DST (not drift), so I wanted to fix it up some and add in a section on how to calculate the 'event' values (likely in the documentation). Specifically, this method we're using in the fleets QA uses the day-night masking function to determine sunrise and sunset times for each day in the time series, then calculates the midpoint between sunrise and sunset for each day. That midpoint value is then compared to the midpoint day value for modeled irradiance (half point between sunrise and sunset) at that particular location, and the difference is calculated. We build a time series based on this midday difference value, and then run changepoint detection (or similar) to figure out step changes in the time series. Here's an example graph, which shows DST in the first three years in the time series before it's corrected (this is from a real AC energy time series): 5003_halfway_diff_cs_poa_ac_energy_inv_2020timeshift This could be expanded to look at how far off time-wise a series is from the modeled timezone irradiance and calculate a suspected timezone. I was planning on testing this function on some test data and start adapting it for PVRW--let me know what you think! I don't love the CPD step, as it's somewhat time intensive, so I was going to look into some additional options for determining step changes in this time series.