High-frequency irradiance synthesis functions

kevinsa5 commented 4 years ago

High-frequency pv simulations are useful in several contexts including grid impact studies and energy storage simulations. The scarcity of high-frequency irradiance datasets has spurred the development of many methods of synthesizing high-frequency irradiance signals from lower-frequency measurements (eg hourly satellite data). A couple examples:

"Sub-Hour Solar Data for Power System Modeling from Static Spatial Variability Analysis" https://www.nrel.gov/docs/fy13osti/56204.pdf

"A stochastic downscaling approach for generating high-frequency solarirradiance scenarios" http://amath.colorado.edu/faculty/kleiberw/papers/Zhang2018.pdf

These models often do not include a software implementation and are complex enough to present a significant barrier to entry for the reader. In that regard, they are similar to the decomposition/transposition irradiance functions included in pvlib. Implementing such a model in pvlib would increase its accessibility to the general public and increase pvlib's utility.

Would pvlib's authors be interested in including such a model in pvlib?

mikofski commented 4 years ago

Would Matthew Lave's wavelet variability model (wvm) also be applicable?

mikofski commented 4 years ago

And with Matthew Reno: Creation and Value of Synthetic High-Frequency Solar Simulations for Distribution System QSTS Simulations

wholmgren commented 4 years ago

Yes I support adding this kind of model to pvlib.

kevinsa5 commented 4 years ago

@mikofski Good point that spatial variability is relevant as well. I'm less familiar with the spatial correlation methods, but have the impression that wavelet models and cloud field models tend to be popular. I'm interested mostly in time variability so that's what I'll focus on here.

I've seen three approaches that I'd classify as "simple" and relatively easy to add to pvlib: Markov chain-based generators, lookup tables, and distribution sampling. The MC generator and lookup table methods require a high-frequency timeseries input dataset to compute the MTMs/lookup tables, while other methods (in particular the distribution sampling methods) are "already trained" in that they are parameterized by e.g. location and scale parameters (with suggested default values) and do not necessarily require a high-frequency irradiance training dataset. Given that pvlib has methods for retrieving high-frequency irradiance datasets (from e.g. SURFRAD), I'd say that requiring a high-frequency training dataset isn't a show-stopper for including those functions in pvlib. Thoughts?

Also, any suggestions for how to write tests for these sorts of functions? The stochastic nature of these methods does not lend itself to the usual direct numeric comparisons. The papers tend to focus on characterizing the generated signals as a whole like comparing their distributions against an expected curve with the 2-sample Kolmogorov-Smirnov test or by comparing autocorrelation against expected, etc.

Here's some additional references:

An N-state Markov-chain mixture distribution model of the clear-sky index https://www.sciencedirect.com/science/article/pii/S0038092X18307205#bb0110

And a spatial extension of the above: A spatiotemporal Markov-chain mixture distribution model of the clear-sky index https://www.sciencedirect.com/science/article/pii/S0038092X18312611#bb0150

A simple and efficient procedure for increasing the temporal resolution of global horizontal solar irradiance series https://www.sciencedirect.com/science/article/pii/S0960148115302044#bib23

Improved Synthesis of Global Irradiance with One-Minute Resolution for PV System Simulations https://www.hindawi.com/journals/ijp/2014/808509/

STOCHASTIC DOWNSCALING ALGORITHM TO GENERATE HIGH-RESOLUTION TIME-SERIES FOR IMPROVED PV YIELD SIMULATIONS https://suntrace.de/fileadmin/user_upload/Duscha_C._A.__Buehler_S.A.__Lezaca_J._Bohny_C.__Meyer_R._Stochastic_Downscaling_Algorithm_to_generate_high_resolution_time-series_for_improved_PV_yield_simulations_2016_PVSEC_2016_.pdf

cwhanse commented 4 years ago

My colleague Matt Lave spent some time looking at various downscaling algorithms/datasets, to create input to electrical distribution system simulations. His conclusion is that statistical downscaling methods tends to underestimate variability when compared to high-frequency measurements. The root case is likely that the algorithms mix too rapidly (or, have too short of memory), because otherwise they become computationally impractical. The methods that process windows of satellite data (the HRIA, for example) also suffered from this tendency.

I'm not against creating a library of such algorithms, but perhaps would be better to set up a separate project than to build a few of the algorithms into pvlib. Downscaling irradiance is very much a topic of research with no convergence to a few, "good" answers.

wholmgren commented 4 years ago

I'm not against creating a library of such algorithms, but perhaps would be better to set up a separate project than to build a few of the algorithms into pvlib. Downscaling irradiance is very much a topic of research with no convergence to a few, "good" answers.

In general I don't see why these algorithms are any less appropriate than the myriad of transposition or airmass models, nor are they are not too domain specific. Reference implementations could be beneficial for further algorithm development.

What I don't want see:

trivial wrappers around functions from packages like sklearn or statsmodels that don't add significant value to the PV modeler.
implementations of statistical models that are already in other packages

I skimmed a few of those references and came away thinking that most of the potential pvlib functions would fall into 1 or 2. @kevinsa5 can you get more specific about what you think belongs in the pvlib modules and what might be better addressed through, say, documentation examples?

kevinsa5 commented 4 years ago

I see Cliff's point that the current state of this research area is fairly disorganized and likely to evolve in the future. I'd argue that, while these algorithms are flawed, an accessible implementation of one or more of them would still be useful in many contexts. As a point of comparison, bifacial modeling is in a similar situation.

What I'm envisioning is some number of low-level downscaling functions and a high-level function that would fit into pvlib's irradiance modeling chain, eg fetch ghi -> downscale -> decomposition -> transposition. It could be that some model implementations might also require training functions to generate MTMs, lookup tables, or other model dependencies.

@wholmgren I'm surprised that you got the impression that some of them could be implemented as trivial wrappers around the pydata stack. For instance, the Fernández-Peruchena/Gastón 2016 method described in "A simple and efficient procedure for increasing the temporal resolution of global horizontal solar irradiance series" goes something like:

# create library of high-res kt signals from a high-res measured ghi signal
def generate_library(ghi_meas, ghi_et):
    # calculate clearness index
    # partition by day
    # normalize time axis of each day's kt signal
    # return normalized kt library

# use library to generate a high-res signal from a low-res signal
def synthesize_variability(ghi, library):
    # for each day:
    #     apply kt library to that day's ghi_et
    #     choose the kt*ghi_et day that most closely recreates the measured ghi signal at native resolution

If the models were as simple as ghi_out = ghi_in * np.random.beta(1, 10, len(ghi_in)) then I'd say a quick documentation example would be fine, but it seems like these are complex enough that a full function would be the better choice.

mikofski commented 4 years ago

@kevinsa5 sorry to diverge, but as valuable to me as high frequency subhourly data, would be synthesizing hourly TMY from monthly, which is more or less the same thing. But the problem I found is in getting a distribution of kt relevant for the site of interest. If I had that then why would I be downscaling monthly? You mentioned earlier that there are some models that

are "already trained" in that they are parameterized by e.g. location and scale parameters (with suggested default values) and do not necessarily require a high-frequency irradiance training dataset.

If pvlib has access to these parameters and could synthesize TMY data for any site globally, that would be awesome!

jranalli commented 4 years ago

I needed to port the discrete point cloud case of the Matlab WVM model to Python and have a working implementation for that case. This is essentially reducing the variability rather than increasing it, and it's also sort of an auxiliary package to the Matlab library, but I saw the mention of the model above.

Would completing this port for the other cases in the Matlab library (relatively easy task) be desirable for contribution to this case?

cwhanse commented 4 years ago

@jar339 I support porting the WVM to python. Question is where.

@wholmgren Put it in irradiance.py? That module is nearly 3000 lines and could already use a refactor. Open to ideas.

wholmgren commented 4 years ago

I support porting the WVM to python. Question is where.

I agree, but I don't have an opinion on where to put it at this time. Maybe irradiance.py would be ok if some of the other stuff was moved elsewhere. Maybe put it in its own module.

I'm surprised that you got the impression that some of them could be implemented as trivial wrappers around the pydata stack.

@kevinsa5 you can attribute that to my lack of familiarity with the models and my lack of time to review the papers. Thanks for the concrete example. That helps a lot.

jranalli commented 4 years ago

I support porting the WVM to python. Question is where.

I agree, but I don't have an opinion on where to put it at this time. Maybe irradiance.py would be > ok if some of the other stuff was moved elsewhere. Maybe put it in its own module.

For what it's worth, there's very little in the model that would be similar to the irradiance module. It's mostly computing a distribution representing the plant, and then the wavelet transform. Personally, I think it makes the most sense in a framework of general variability/frequency, or else spatial analysis. Maybe if one of those is likely to get future contributions, it might be reasonable.

Should I spin this off to a separate issue, since it might be different (and more compartmented) than the broader downscaling discussion?

cwhanse commented 4 years ago

Should I spin this off to a separate issue, since it might be different (and more compartmented) than the broader downscaling discussion?

Yes. Let's start a new module with this submission, scaling.py comes to mind, but I'm not enamored of it. Scope will be functions that operate on irradiance, perhaps other variables, to transform temporal or spatial characteristics.

pvlib / pvlib-python

High-frequency irradiance synthesis functions #788