pvlib / pvlib-python

A set of documented functions for simulating the performance of photovoltaic energy systems.
https://pvlib-python.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.19k stars 997 forks source link

spectral_factor_firstsolar changes user input data #2086

Closed RDaxini closed 3 months ago

RDaxini commented 4 months ago

Is your feature request related to a problem? Please describe. pvlib.spectrum.spectral_factor_firstsolar contains data screens for precipitable_water and airmass_absolute that change input data. Max/min values can be set for precipitable_water by the user but not airmass_absolute.

I see two issues:

  1. Changing data in any circumstance is a serious question, but in this case specifically I do not think it is pvlib's job to change user data.
  2. User control and the function's flexibility is impeded by the absence of module parameters to control min/max airmass_absolute .

Describe the solution you'd like I can think of several potential solutions but would like community feedback on what should be implemented. Based on the feedback, I can create a PR with proposed changes. Current ideas:

  1. Do nothing.
  2. Remove all data screens for airmass_absolute and precipitable_water
  3. Retain data screens but replace actions to change user data with only warnings to the user that the input data exceeds the thresholds, and these thresholds can either be:
    • Defined by the user, or
    • Defined by the developer
  4. Add new arguments (max_airmass_absolute and min_airmass_absolute) to pvlib.spectrum.spectral_factor_firstsolar so that the user can define min/max values for airmass_absolute in the same way they can define max_precipitable_water and min_precipitable_water, and their input data will be changed accordingly.
  5. Something else?

Additional context I understand this issue touches on a question about the role of pvlib as a toolbox and to what extent it should hold the user's hand through the application of its functions. I touched on the subject in this informal blog post where I used the same spectral factor function as an example. At the very least, I don't think we should be changing user data without letting the user control this change. Going a step further, I don't think this function should include within it an action to change user data. It's job is to calculate the spectral mismatch factor given a set of airmass_absolute and precipitable_water values. I see room for including warnings to advise the user.

wholmgren commented 4 months ago

that change input data.

Can you clarify what you mean by this? To my knowledge, pvlib is not changing any user data. The test below demonstrates that pvlib does not change the input pw data. Rather, it applies a minimum value before passing the data on to the polynomial (aside: I think the choice of "replaced" in the warning message is not ideal).

In [8]: s = pd.Series([0, 1])

In [9]: pvlib.spectrum.mismatch.spectral_factor_firstsolar(s, 1, module_type='cdte')
/Users/holmgren/git_repos/pvlib-python/pvlib/spectrum/mismatch.py:359: UserWarning: Exceptionally low pw values replaced with 0.1 cm to prevent model divergence
  warn('Exceptionally low pw values replaced with '
Out[9]: array([0.93459226, 0.9905102 ])

In [10]: s
Out[10]:
0    0
1    1
dtype: int64

Can you please provide an example of pvlib changing user data? Or are we only talking about pvlib restricting the valid input data to the polynomial? Note that First Solar contributed the pvlib implementation and considers this restriction to be part of the model.

I'm not opposed to adding similar limits parameters for airmass.

mikofski commented 4 months ago

@RDaxini correlations like firstsolar spectral function frequently have narrow bounds in which they are valid. The documentation states over what range the correlations are valid. I think a faithful implementation should enforce these limits and not allow extrapolation

RDaxini commented 4 months ago

Thanks both for the comments. I'm lumping my responses together there's some overlap.

@mikofski For sure there are published limits, but the limits in the publication and docs do not match the pvlib implementation. For example, airmass between 0 and 10 in pvlib but 0 and 5 in the paper/docs. Is there another document that justifies the implemented range? Then if any limits/restrictions are to be enforced, I'm not convinced calculating a mismatch value based on a value the user did not enter is the best enforcement. @wholmgren mentioned First Solar considers the limits part of the model, which I understand, but do they advise on how inputs outside of the prescribed range should be handled? At least with pw inputs exceeding max_precipitable_water, these default to np.nan --- this seems like a better approach if the consensus is that the computation should not take place with anything other than values in the specified range.

@wholmgren Ah, I did not mean the variable is redefined. I meant the input is changed for the calculation of mismatch, ie mismatch is calculated for an input that differs from the user input. In the case of pw, the user is warned and has the option to change the limits. In the case of airmass, the user is not warned and is unable to change the limits.

Eg1: Warning, M calculated from default pw 0.1 rather than user input 0:

In [28]: s = pd.Series([0, 0.1])

In [29]: M = pv.spectrum.mismatch.spectral_factor_firstsolar(s, 1, module_type='cdte')
C:\Users\ezxrd3\Documents\GitHub\pvlib-python\pvlib\spectrum\mismatch.py:369: UserWarning: Exceptionally low pw values replaced with 0.1 cm to prevent model divergence
  warn('Exceptionally low pw values replaced with '

In [30]: M
Out[30]: array([0.93459226, 0.93459226])

Eg2: No warning, M calculated from default ama 10 rather than user input 11, default cannot be changed:

In [34]: a = pd.Series([11, 10])

In [35]: M = pv.spectrum.mismatch.spectral_factor_firstsolar(0.5, a, module_type='cdte')

In [36]: M
Out[36]: 
0    0.778779
1    0.778779

So, just my view, for sure the docs should explain the data ranges over which the model was developed, but currently I still lean more towards letting the user execute their own data filtering. I am not entirely opposed to retaining data screens in the function though, but then in this case I think the user should have control over the air mass limits (as with pw already) and be warned if they are triggered. What do you think?

AdamRJensen commented 4 months ago

Given the discrepancy between using a max airmass of 5 vs 10, I am in support of adding a max_airmass_absolute parameter.

First, please go through the discussion on GH of the initial addition of the function to see whether there is arguments for why the max airmass is 10 and not 5 as per the paper.

If you cannot find anything on this, then I suggest asking the authors. If it turns out we have not good reason for using 10 and not 5, then I suggest a deprecating cycle where we will change the default value max_airmass_absolute from 10 to 5. For an example on this, then see lies 351-357 of the changes in #2094.

cwhanse commented 4 months ago

please go through the discussion on GH of the initial addition of the function

208

Looks to me that AM<10 is a choice that the author (M. Lee) made when implementing the model in code. Same limit is present in the Matlab function.

AdamRJensen commented 4 months ago

@RDaxini Could you paste the quote from the paper where a limit of 5 is described?

RDaxini commented 4 months ago

@RDaxini Could you paste the quote from the paper where a limit of 5 is described?

fs_am_pw Image 1 (above): pw and am filter requested @AdamRJensen (extracted from Section II of the manuscript)


Image 2 (below): just some other filters/assumptions in the paper, not sure if relevant but when talking about ensuring the implementation in pvlib is true to the original model, it got me thinking about what other limits there are. How do we decide which limits are implemented in pvlib? This is just a general question from me as a new contributor, I want to understand the thinking behind this a bit more fs_other

AdamRJensen commented 4 months ago

@RDaxini I read it as simulations were done for 1 to 5 AM, and NOT as an explicit statement that the model is valid for 1 to 5 AM. Therefore, I think it's probably ok to keep the current upper limit of 10. However, I still think there's benefit to adding the parameter to the function.

cwhanse commented 4 months ago

The regression was performed on the simulations with AM<5. In M. Lee's judgement, its extrapolation to 5<AM<10 was considered reasonable. I don't know if the extrapolation is specifically described in the paper.

The second excerpt reads (to me) like a description of how they cleaned up measurements to either fit or verify the model.

RDaxini commented 4 months ago

@AdamRJensen agreed, it does not state that it is a specific range of model validity, nowhere in the paper are there any specific statements pertaining to model validity within any particular range (as far as I see...). @cwhanse unless I missed it, the extrapolation is not described in the paper (neither is the upper pw limit extrapolation) So, I was not sure how these upper limits were decided or linked to the paper to be justified as part of the implementation.

Based on this discussion, I think for now I will create a simple PR to add the parameter into the function and revise the docs accordingly.

What are your thoughts on the handling of inputs exceeding the limits --- return np.nan as is currently the case for exceeding max_precipitable_water, or stick with setting the user input to equal the specified parameter value, as is currently the case with min_precipitable_water?

AdamRJensen commented 4 months ago

I vote

stick with setting the user input to equal the specified parameter value, as is currently the case with min_precipitable_water?

RDaxini commented 4 months ago

Created PR #2100

adriesse commented 4 months ago

Good and important observations and discussion. Recognizing and managing the limits of model validity and validation are high on my priority/wish list.

The clipping of pwat at 0.1 (not 8.0) and airmass at 10 was also done in a spreadsheet FS once distributed, so it was presumably also done during model validation. We don't know whether any data points were affected by pwat clipping in the validation data sets, so we don't know whether this behaviour is truly validated. I would consider the clipping/extrapolation to be part of the model.