NREL mismatch loss addendum

echedey-ls commented 3 months ago

Addendum to PR #2046 from a comparison with the authors implementation made publicly at https://github.com/NREL/bifacial_radiance/blob/144e6d00c2ddb391cf0904191d7c347be4436b1f/bifacial_radiance/mismatch.py#L166.

[x] Addresses comment https://github.com/pvlib/pvlib-python/pull/2046#issuecomment-2200703792
[x] I am familiar with the contributing guidelines
~[ ] Tests added~
~[ ] Updates entries in docs/sphinx/source/reference for API changes.~
[x] Adds description and name entries in the appropriate "what's new" file in docs/sphinx/source/whatsnew for all changes. Includes link to the GitHub Issue with :issue:`num` or this Pull Request with :pull:`num`. Includes contributor name and/or GitHub username (link with :ghuser:`user`).
[x] New code is fully documented. Includes numpydoc compliant docstrings, examples, and comments where necessary.
[x] Pull request is nearly complete and ready for detailed review.
[x] Maintainer: Appropriate GitHub Labels (including remote-data) and Milestone are assigned to the Pull Request and linked Issue.

@cdeline asked for a comparison with an implementation of his model (Fit 3), and it has helped find one error and one issue:

Fixed error: During my investigations on the Internet, I confused Mean Absolute Difference with Mean Absolute Deviation, so that helper function in the example has been fixed.
Pending issue: The polynomial model in the authors implementation is evaluated over the unitless RMAD (Relative Mean Absolute Difference), whereas in the paper we can find the units of equation (12) being in percentage: $$M[\%]_{Fit3} = 0.054 \Delta [\%]+ 0.068 \Delta [\%] ^2$$

Code in implementation:
```
mad = mad_fn(datac) /100  # (percentage)
mad2 = mad**2

fit3 = 0.054*mad + 0.068*mad2
```
First line converts mad_fn output from percentage to unitless ratio.

In the pvlib implementation, this has been understood as if the input to the polynomial is the RMAD in percentage, so we tuned the coefficients to match the units. Docs, see Parameters and Notes sections.

Down below is the code I used to compare the implementations.

Comparison code

Note this does not comply with the 2D broadcasting rules and datatypes IO in the authors implementation. ```python # %% from pvlib.bifacial import power_mismatch_deline import numpy as np # %% # bifacial_radiance implementation def mismatch_fit3(data): ''' Electrical mismatch calculation following Progress in PV paper Estimating and parameterizing mismatch power loss in bifacial photovoltaic systems Chris Deline, Silvana Ayala Pelaez,Sara MacAlpine,Carlos Olalla https://doi.org/10.1002/pip.3259 Parameters ---------- data : np.ndarray, pd.Series, pd.DataFrame Gtotal irradiance measurements. Each column is the irradiance for a module at a specific time. Returns ------- fit3 : Float or pd.Series Returns mismatch values for each module Equation: 1/(n^2*Gavg)*Sum Sum (abs(G_i - G_j)) ## Note: starting with Pandas 1.0.0 this function will not work on Series objects. ''' import numpy as np import pandas as pd if type(data) == np.ndarray: data = pd.DataFrame(data) datac = data[~np.isnan(data)] mad = mad_fn(datac) /100 # (percentage) print(f"{mad=}") mad2 = mad**2 fit3 = 0.054*mad + 0.068*mad2 if fit3.__len__() == 1: if isinstance(fit3, pd.Series): fit3 = float(fit3.iloc[0]) else: fit3 = float(fit3) return fit3 def mad_fn(data, axis='index'): ''' Mean average deviation calculation for mismatch purposes. Parameters ---------- data : np.ndarray or pd.Series or pd.DataFrame Gtotal irradiance measurements. If data is a pandas.DataFrame, one MAD/Average is returned for each index, based on values across columns. axis : {0 or 'index', 1 or 'columns'}, default 'index' Calculate mean average deviation across rows (default) or columns for 2D data * 0, or 'index' : MAD calculated across rows. * 1, or 'columns' : MAD calculated across columns. Returns ------- scalar or pd.Series: return MAD / Average [%]. Scalar for a 1D array, Series for 2D. Equation: 1/(n^2*Gavg)*Sum Sum (abs(G_i - G_j)) * 100[%] ''' import numpy as np import pandas as pd def _mad_1D(data): #1D calculation of MAD return (np.abs(np.subtract.outer(data,data)).sum()/float(data.__len__())**2 / np.mean(data))*100 if type(axis) == str: try: axis = {"index": 0, "rows": 0, 'columns':1}[axis] except KeyError: raise Exception('Incorrect index string in mad_fn. options: index, rows, columns.') ndim = data.ndim if ndim == 2 and axis==0: data = data.T # Pandas returns a notimplemented error if this is a DataFrame. if (type(data) == pd.Series): data = data.to_numpy() if type(data) == pd.DataFrame: temp = data.apply(pd.Series.to_numpy, axis=1) return(temp.apply(_mad_1D)) elif ndim ==2: #2D array return [_mad_1D(i) for i in data] else: return _mad_1D(data) # %% # pvlib rmad function in example def rmad(data): """ Relative Mean Absolute Difference. Output is [Unitless]. Eq. (4) of [1]_. """ mean = np.mean(data) mad = np.mean(np.absolute(np.subtract.outer(data, data))) return mad / mean # %% # test input data, one dimensional G_global_dim1 = np.array([1059, 976, 967, 986, 1034, 1128]) # %% # test input data, two dimensional G_global_dim2 = np.repeat([1059, 976, 967, 986, 1034, 1128], 12).reshape( 6, 12 ) # %% # test implementations for input_G_global in [G_global_dim1, G_global_dim2]: dim = input_G_global.ndim print(f"Testing for input data with {dim=}") # output from bifacial_radiance implementation result_bifacial_radiance = mismatch_fit3(input_G_global) # output from pvlib implementation rmad_G_global = rmad(input_G_global) result_pvlib = power_mismatch_deline( rmad=rmad_G_global, coefficients=(0, 0.054, 0.068) ) print(f"{result_bifacial_radiance=}\n{result_pvlib=}\n-------------------") ```

echedey-ls commented 3 months ago

@cdeline I suspect we should change the coefficients and the documentation to reflect that both the equation in the paper and in pvlib use the unitless RMAD, but before I proceed I'd like to have your confirmation.

EDIT: I think it would also be helpful to specify whether mismatch_fit3 output is percentage or unitless.

cdeline commented 3 months ago

Hi @echedey-ls. Your implementation appears to be correct and matches the method as described in the paper. I think that your use of Fit2 is the correct one based on the text of the paper, and the extra /100 in the bifacial_radiance mismatch_fit3 function is an error that we'll have to correct. I like your unitless approach for everything better.

echedey-ls commented 3 months ago

Thanks @cdeline for the insight. This PR is then ready for review.

cdeline commented 3 months ago

I think that we need to get a PR into bifacial_radiance to fix this bug and switch to the Fit2 which has more broad applicability, vs Fit3. Here's your code, modified to show convergent behavior. However, there is a problem if you try to compare against the Fit3 coefficients by passing them in to your power_mismatch_deline() function.

# %%
from pvlib.bifacial import power_mismatch_deline
import numpy as np

# %%
# bifacial_radiance implementation
def mismatch_fit2(data):
    '''
    Electrical mismatch calculation following Progress in PV paper
    Estimating and parameterizing mismatch power loss in bifacial photovoltaic systems
    Chris Deline, Silvana Ayala Pelaez,Sara MacAlpine,Carlos Olalla
    https://doi.org/10.1002/pip.3259 

    Parameters
    ----------
    data : np.ndarray, pd.Series, pd.DataFrame
        Gtotal irradiance measurements. Each column is the irradiance for a module
        at a specific time. 

    Returns
    -------
    fit3 :  Float or pd.Series
        Returns mismatch values for each module as a [%]

    Equation: 1/(n^2*Gavg)*Sum Sum (abs(G_i - G_j))
    ## Note: starting with Pandas 1.0.0 this function will not work on Series objects.
    '''
    import numpy as np
    import pandas as pd

    if type(data) == np.ndarray:
        data = pd.DataFrame(data)

    datac = data[~np.isnan(data)]
    mad = mad_fn(datac)   # (unitless)
    print(f"{mad=}")
    mad2 = mad**2

    fit2 = 0.142*mad + 0.032*mad2

    if fit2.__len__() == 1:
        if isinstance(fit2, pd.Series):
            fit2 = float(fit2.iloc[0])
        else:
            fit2 = float(fit2)

    return fit2

def mad_fn(data, axis='index'):
    '''
    Mean average deviation calculation for mismatch purposes.

    Parameters
    ----------
    data : np.ndarray or pd.Series or pd.DataFrame
        Gtotal irradiance measurements. If data is a pandas.DataFrame, one 
        MAD/Average is returned for each index, based on values across columns.

    axis : {0 or 'index', 1 or 'columns'}, default 'index'   
        Calculate mean average deviation across rows (default) or columns for 2D data

        * 0, or 'index' : MAD calculated across rows.
        * 1, or 'columns' : MAD calculated across columns.
    Returns
    -------
    scalar or pd.Series:   return MAD / Average [%]. Scalar for a 1D array, Series for 2D.

    Equation: 1/(n^2*Gavg)*Sum Sum (abs(G_i - G_j)) * 100[%]

    '''
    import numpy as np
    import pandas as pd
    def _mad_1D(data):  #1D calculation of MAD
        return (np.abs(np.subtract.outer(data,data)).sum()/float(data.__len__())**2 / np.mean(data))*100
    if type(axis) == str:
        try:
            axis = {"index": 0, "rows": 0, 'columns':1}[axis]
        except KeyError:
            raise Exception('Incorrect index string in mad_fn. options: index, rows, columns.')

    ndim = data.ndim
    if ndim == 2 and axis==0:
        data = data.T
    # Pandas returns a notimplemented error if this is a DataFrame.
    if (type(data) == pd.Series):
        data = data.to_numpy()

    if type(data) == pd.DataFrame:
        temp = data.apply(pd.Series.to_numpy, axis=1)
        return(temp.apply(_mad_1D))
    elif ndim ==2: #2D array
        return [_mad_1D(i) for i in data]
    else:
        return _mad_1D(data)

# %%
# pvlib rmad function in example
def rmad(data):
    """
    Relative Mean Absolute Difference. Output is [Unitless]. Eq. (4) of [1]_.
    """
    mean = np.mean(data)
    mad = np.mean(np.absolute(np.subtract.outer(data, data)))
    return mad / mean

# %%
# test input data, one dimensional
G_global_dim1 = np.array([1059, 976, 967, 986, 1034, 1128])

# %%
# test input data, two dimensional
G_global_dim2 = np.repeat([1059, 976, 967, 986, 1034, 1128], 12).reshape(
    6, 12
)

# %%
# test implementations
for input_G_global in [G_global_dim1, G_global_dim2]:
    dim = input_G_global.ndim
    print(f"Testing for input data with {dim=}")
    # output from bifacial_radiance implementation
    result_bifacial_radiance = mismatch_fit2(input_G_global)
    # output from pvlib implementation
    rmad_G_global = rmad(input_G_global)
    result_pvlib = power_mismatch_deline(
        rmad=rmad_G_global
    )
    print(f"{result_bifacial_radiance=}\n{result_pvlib=}\n-------------------")

cdeline commented 3 months ago

@echedey-ls - I have a question on the 'coefficients' input into power_mismatch_deline - I do get good agreement with our Fit3 coefficients if I use a *100 in the second coefficient:

    result3_pvlib = power_mismatch_deline(
        rmad=rmad_G_global, coefficients=(0, .054, .068*100)
    )

I guess that's the desired behavior since it includes a conversion from [%] to [unitless] on the input and output...

echedey-ls commented 3 months ago

I think that we need to get a PR into bifacial_radiance

Gotcha, I'd like to do that.

I do get good agreement with our Fit3 coefficients if I use a *100 in the second coefficient

Yeah, that's the desired behaviour. My original comparison isn't exactly right now that you confirm what issues the bifacial_radiance implementation has:

The model coefficients are for % input
However, the input is unitless

The output is supposed to be in % per your comparison code for fit 2 docstring:

Returns
-------
fit3 :  Float or pd.Series
    Returns mismatch values for each module as a [%]

So a fair comparison would be what is currently documented in pvlib:

Previous issues addressed
Second-order coeff in pvlib.bifacial.power_mismatch_deline multiplied by 100

Code below

Details

```python # %% from pvlib.bifacial import power_mismatch_deline import numpy as np # %% # bifacial_radiance implementation def mismatch_fit3(data): ''' Electrical mismatch calculation following Progress in PV paper Estimating and parameterizing mismatch power loss in bifacial photovoltaic systems Chris Deline, Silvana Ayala Pelaez,Sara MacAlpine,Carlos Olalla https://doi.org/10.1002/pip.3259 Parameters ---------- data : np.ndarray, pd.Series, pd.DataFrame Gtotal irradiance measurements. Each column is the irradiance for a module at a specific time. Returns ------- fit3 : Float or pd.Series Returns mismatch values for each module Equation: 1/(n^2*Gavg)*Sum Sum (abs(G_i - G_j)) ## Note: starting with Pandas 1.0.0 this function will not work on Series objects. ''' import numpy as np import pandas as pd if type(data) == np.ndarray: data = pd.DataFrame(data) datac = data[~np.isnan(data)] mad = mad_fn(datac) # (percentage) print(f"{mad=}") mad2 = mad**2 fit3 = 0.054*mad + 0.068*mad2 if fit3.__len__() == 1: if isinstance(fit3, pd.Series): fit3 = float(fit3.iloc[0]) else: fit3 = float(fit3) return fit3 def mad_fn(data, axis='index'): ''' Mean average deviation calculation for mismatch purposes. Parameters ---------- data : np.ndarray or pd.Series or pd.DataFrame Gtotal irradiance measurements. If data is a pandas.DataFrame, one MAD/Average is returned for each index, based on values across columns. axis : {0 or 'index', 1 or 'columns'}, default 'index' Calculate mean average deviation across rows (default) or columns for 2D data * 0, or 'index' : MAD calculated across rows. * 1, or 'columns' : MAD calculated across columns. Returns ------- scalar or pd.Series: return MAD / Average [%]. Scalar for a 1D array, Series for 2D. Equation: 1/(n^2*Gavg)*Sum Sum (abs(G_i - G_j)) * 100[%] ''' import numpy as np import pandas as pd def _mad_1D(data): #1D calculation of MAD return (np.abs(np.subtract.outer(data,data)).sum()/float(data.__len__())**2 / np.mean(data))*100 if type(axis) == str: try: axis = {"index": 0, "rows": 0, 'columns':1}[axis] except KeyError: raise Exception('Incorrect index string in mad_fn. options: index, rows, columns.') ndim = data.ndim if ndim == 2 and axis==0: data = data.T # Pandas returns a notimplemented error if this is a DataFrame. if (type(data) == pd.Series): data = data.to_numpy() if type(data) == pd.DataFrame: temp = data.apply(pd.Series.to_numpy, axis=1) return(temp.apply(_mad_1D)) elif ndim ==2: #2D array return [_mad_1D(i) for i in data] else: return _mad_1D(data) # %% # pvlib rmad function in example def rmad(data): """ Relative Mean Absolute Difference. Output is [Unitless]. Eq. (4) of [1]_. """ mean = np.mean(data) mad = np.mean(np.absolute(np.subtract.outer(data, data))) return mad / mean # %% # test input data, one dimensional G_global_dim1 = np.array([1059, 976, 967, 986, 1034, 1128]) # %% # test input data, two dimensional G_global_dim2 = np.repeat([1059, 976, 967, 986, 1034, 1128], 12).reshape( 6, 12 ) # %% # test implementations for input_G_global in [G_global_dim1, G_global_dim2]: dim = input_G_global.ndim print(f"Testing for input data with {dim=}") # output from bifacial_radiance implementation result_bifacial_radiance = mismatch_fit3(input_G_global) # output from pvlib implementation rmad_G_global = rmad(input_G_global) result_pvlib = power_mismatch_deline( rmad=rmad_G_global, coefficients=(0, 0.054, 0.068*100) ) print(f"{result_bifacial_radiance=}\n{result_pvlib=}\n-------------------") ```

And now bifacial_radiance output is x100 the one in pvlib, which is expected since the former works with percentages and the latter with a [0,1] ratios:

Testing for input data with dim=1
mad=0    5.9729
dtype: float64
result_bifacial_radiance=2.748472705106455
result_pvlib=0.02748472705106455
-------------------
Testing for input data with dim=2
mad=0     5.9729
1     5.9729
2     5.9729
3     5.9729
4     5.9729
5     5.9729
6     5.9729
7     5.9729
8     5.9729
9     5.9729
10    5.9729
11    5.9729
dtype: float64
result_bifacial_radiance=0     2.748473
1     2.748473
2     2.748473
3     2.748473
4     2.748473
...
11    2.748473
dtype: float64
result_pvlib=0.02748472705106455
-------------------

echedey-ls commented 3 months ago

@kandersolar @cwhanse this PR is ready for review😄 👀

pvlib / pvlib-python

NREL mismatch loss addendum #2147