pvlib.iam.marion_integrate uses too much memory for vector inputs

kandersolar commented 2 years ago

pvlib.iam.marion_integrate (which is mostly relevant as a helper for pvlib.iam.marion_diffuse) needs quite a bit of memory when passed vector inputs. An input of length 1000 allocates around 2GB of memory on my machine, so naively passing in a standard 8760 would use roughly 17-18 GB. Unfortunately I was very much focused on fixed tilt simulations when I wrote pvlib's implementation and never tried it out on large vector inputs, so this problem went unnoticed until @spaneja pointed it out to me.

I think any vectorized implementation of this algorithm is going to be rather memory-heavy, so I'm skeptical that achieving even a factor of 10 reduction in memory usage is possible here without completely changing the approach (and likely shifting the burden from memory to CPU). However, here are two low-hanging fruits worth considering:

The current implementation has a handful of large 2-D arrays local to the function that only get released when the function returns. Some of them are only used near the beginning of the function but still take up memory for the entire function duration. Using the del statement to instruct python that those arrays are no longer needed allows python to reclaim that memory immediately and recycle it for subsequent allocations. This is probably a simplification of what actually happens, but it seems consistent with the below observations.
np.float32 cuts memory usage in half compared with np.float64 and (probably) doesn't meaningfully change the result. It's not like surface_tilt has more than a few sig figs anyway.

Here is a rough memory and timing comparison (using memory_profiler, very handy). pvlib is the current implementation; the two del variants use a strategic sprinkling of del but are otherwise not much different from pvlib. This is for an input of length 1000. The traces here are memory usage sampled at short intervals across a single function invocation; for example the blue pvlib trace shows that the function call took 1.4 seconds to complete and had a peak memory usage slightly higher than 2GB.

So using a few dels cuts peak memory usage roughly in half. Dropping down to np.float32 cuts it roughly in half again (and gives a nontrivial speedup too). It's possible that further improvements can be had with other tricks (e.g. using the out parameter that some numpy functions provide) but I've not yet explored them.

My main question: are we open to using these two strategies in pvlib? Despite being built into python itself, del still seems unpythonic to me for some reason. Switching away from float64 is objectionable to the extent that it's the standard in scientific computing and is therefore baked into the models by assumption. I think I'm cautiously open to both of the above approaches, iff they are accompanied by good explanatory comments and switching to float32 can be reasonably shown to not introduce a meaningful difference in output.

Remark: even ignoring this memory bloat, I tend to think that applying marion_integrate directly to an 8760 is a bit strange. In simulations with time series surface_tilts, a better approach IMHO is to calculate the IAM values only for np.linspace(0, 90, 1) or similar and use pvlib.iam.interp to generate the 8760 IAM series. If nothing else, we might suggest that in the docs.

cwhanse commented 2 years ago

I see no reason not to use del. Perhaps your aversion is a sense that needing to use del is a clue that the algorithm has room for improvement.

What about a third option - complement the existing iam.marion_diffuse with a pre-calculated lookup table and interpolation? Should only be about 8k of memory (tilt in 5 degree increments from 0 to 180, AOI in 5 degree increments from 0 to 90, 8 bytes per value).

I don't think pvlib.iam.interp applies directly as a substitute (its innards may be resuable). Marion's model integrates over a portion of a hemisphere and returns the average IAM, and implicitly assumes equal irradiance arriving from each integrating element (* see below). So the result can be multiplied by the diffuse sky or horizon irradiance for models which assume these quantities are isotropic in their respective domains (90 to 89.5 degrees for horizon diffuse). The output of pvlib.iam.interp is specific to one AOI, i.e., irradiance arriving from a single direction.

(*) I think the calculation may be wrong when applied to ground-reflected irradiance, since an integrating element is defined by a solid angle, not by an equal area portion of the viewed ground. I think that a proper view factor is needed in the integrand. The sky diffuse calculation doesn't have this problem because it is assumed that the irradiance source is the hemisphere rather than the ground plane.

mikofski commented 2 years ago

I see no reason not to use del. Perhaps your aversion is a sense that needing to use del is a clue that the algorithm has room for improvement.

What an interesting conundrum. I too feel del is unpythonic and have had folks question my stodginess, but what Cliff says rings true. What I crave is elegance. The right solution would balance both CPU and memory.

your proposal to suggest that folks use linspace and interpolate makes sense to me because there will be so many AOI that are nearly identical that it's a super fantastically redundant waste of time to recalculate every instance instead of just the most common ones
caching or memoizing somehow is a sneaky way to cheat memory and CPU so that expensive calculations are not repeated needlessly - but TBH, I have little experience how to do this effectively. I think some of the scipy optimize functions memoize by creating a secret class to hold on to past calculations.
I think as a short term hack del is fine. A more pythonic method might be to do the temporary calcs in a subfunction which I feel like 99% confident should release any temporary memory that isn't needed for the return, or that wasn't already used by the caller.
This last idea, of a temporary function that manages temporary memory also lends itself well to parallelization either using Python-MP or some external task manager like dask. This might also help alleviate the memory (or possibly CPU) issues by breaking up the input into smaller chunks?
if all else fails we can write a C, Cython, or Pythran extension or use Numba?

Sorry, these are just some random ideas, because this problem just seems so fun!

kandersolar commented 2 years ago

What about a third option - complement the existing iam.marion_diffuse with a pre-calculated lookup table and interpolation?

Not wholly opposed, but the cool part of marion_diffuse is that it works for any IAM function. Pre-calculating restricts you to some finite set of IAM functions. Probably the majority of people stick with the few models we provide (with parameter left as default) so maybe that's not a big loss.

I don't think pvlib.iam.interp applies directly as a substitute (its innards may be resuable)

Good point, I was ignoring the parameter names and treating it like a general interpolator: interp(aoi=surface_tilt, theta_ref=linspace_0to90, iam_ref=iam_0to90, normalize=False). Not ideal.

I think the calculation may be wrong when applied to ground-reflected irradiance, since an integrating element is defined by a solid angle

Does Equation 8 in the reference not define a proper solid angle integration element? Maybe I need to take another look but I thought that was done correctly for all three regions, including the ground plane.

A more pythonic method might be to do the temporary calcs in a subfunction

Yes this would do the same memory reclamation as the del approach. I guess the question is then whether del is uglier than a handful of non-reusable two line helper functions :)

wholmgren commented 2 years ago

Not opposed to del but would the numpy out kwarg help?

mikofski commented 2 years ago

Not opposed to del but would the numpy out kwarg help?

I think it's worth trying. I'd like to see it's memory profile in the chart with the other options.

adriesse commented 2 years ago

I don't oppose anything above. Other options:

set a smaller value for num
implement the polynomial approximations given in Marion's paper
just use 0.95 for sky and 0.95 * iam(90-tilt) for horizon and ground since the impact on Gpoa is pretty small anyway

kdebrab commented 2 years ago

What about a third option - complement the existing iam.marion_diffuse with a pre-calculated lookup table and interpolation?

Not wholly opposed, but the cool part of marion_diffuse is that it works for any IAM function. Pre-calculating restricts you to some finite set of IAM functions. Probably the majority of people stick with the few models we provide (with parameter left as default) so maybe that's not a big loss.

I think you can still use a (dynamically calculated) lookup table with custom IAM functions. Something like:

if isinstance(surface_tilt, pd.Series):
    # use a look-up table to avoid memory error
    lut_surface_tilts = np.linspace(0, 90, 181)
    idxs = np.searchsorted(lut_surface_tilts, surface_tilt)
    iam_diffuse_sky = pvlib.iam.marion_integrate(iam_function, lut_surface_tilts, "sky")[idxs]
else:
    # surface_tilt is just a number, no need for a look-up table
    iam_diffuse_sky = pvlib.iam.marion_integrate(iam_function, surface_tilt, "sky")

adriesse commented 2 years ago

Hi Karel, long time no see!

Some ideas:

interpolation on the lut values?
check the number of values rather than the class of the input?

kdebrab commented 2 years ago

Hi Anton,

Nice to interact with you!

I do agree that there are better options than checking for a Series. I just wanted to post the general idea.

However, I don't think interpolation is worth the effort. The LUT values are already close enough. Unless one cares about the 4th digit after the comma...

adriesse commented 2 years ago

Hi Karel, for me it's not about accuracy but I have a phobia of discontinuities! Your general idea is good.

pvlib / pvlib-python

pvlib.iam.marion_integrate uses too much memory for vector inputs #1402