Closed RDaxini closed 3 months ago
that change input data.
Can you clarify what you mean by this? To my knowledge, pvlib is not changing any user data. The test below demonstrates that pvlib does not change the input pw
data. Rather, it applies a minimum value before passing the data on to the polynomial (aside: I think the choice of "replaced" in the warning message is not ideal).
In [8]: s = pd.Series([0, 1])
In [9]: pvlib.spectrum.mismatch.spectral_factor_firstsolar(s, 1, module_type='cdte')
/Users/holmgren/git_repos/pvlib-python/pvlib/spectrum/mismatch.py:359: UserWarning: Exceptionally low pw values replaced with 0.1 cm to prevent model divergence
warn('Exceptionally low pw values replaced with '
Out[9]: array([0.93459226, 0.9905102 ])
In [10]: s
Out[10]:
0 0
1 1
dtype: int64
Can you please provide an example of pvlib changing user data? Or are we only talking about pvlib restricting the valid input data to the polynomial? Note that First Solar contributed the pvlib implementation and considers this restriction to be part of the model.
I'm not opposed to adding similar limits parameters for airmass.
@RDaxini correlations like firstsolar spectral function frequently have narrow bounds in which they are valid. The documentation states over what range the correlations are valid. I think a faithful implementation should enforce these limits and not allow extrapolation
Thanks both for the comments. I'm lumping my responses together there's some overlap.
@mikofski For sure there are published limits, but the limits in the publication and docs do not match the pvlib implementation. For example, airmass between 0 and 10 in pvlib but 0 and 5 in the paper/docs. Is there another document that justifies the implemented range? Then if any limits/restrictions are to be enforced, I'm not convinced calculating a mismatch value based on a value the user did not enter is the best enforcement. @wholmgren mentioned First Solar considers the limits part of the model, which I understand, but do they advise on how inputs outside of the prescribed range should be handled? At least with pw inputs exceeding max_precipitable_water
, these default to np.nan
--- this seems like a better approach if the consensus is that the computation should not take place with anything other than values in the specified range.
@wholmgren Ah, I did not mean the variable is redefined. I meant the input is changed for the calculation of mismatch, ie mismatch is calculated for an input that differs from the user input. In the case of pw
, the user is warned and has the option to change the limits. In the case of airmass, the user is not warned and is unable to change the limits.
Eg1: Warning, M
calculated from default pw 0.1
rather than user input 0
:
In [28]: s = pd.Series([0, 0.1])
In [29]: M = pv.spectrum.mismatch.spectral_factor_firstsolar(s, 1, module_type='cdte')
C:\Users\ezxrd3\Documents\GitHub\pvlib-python\pvlib\spectrum\mismatch.py:369: UserWarning: Exceptionally low pw values replaced with 0.1 cm to prevent model divergence
warn('Exceptionally low pw values replaced with '
In [30]: M
Out[30]: array([0.93459226, 0.93459226])
Eg2: No warning, M
calculated from default ama 10
rather than user input 11
, default cannot be changed:
In [34]: a = pd.Series([11, 10])
In [35]: M = pv.spectrum.mismatch.spectral_factor_firstsolar(0.5, a, module_type='cdte')
In [36]: M
Out[36]:
0 0.778779
1 0.778779
So, just my view, for sure the docs should explain the data ranges over which the model was developed, but currently I still lean more towards letting the user execute their own data filtering. I am not entirely opposed to retaining data screens in the function though, but then in this case I think the user should have control over the air mass limits (as with pw already) and be warned if they are triggered. What do you think?
Given the discrepancy between using a max airmass of 5 vs 10, I am in support of adding a max_airmass_absolute
parameter.
First, please go through the discussion on GH of the initial addition of the function to see whether there is arguments for why the max airmass is 10 and not 5 as per the paper.
If you cannot find anything on this, then I suggest asking the authors. If it turns out we have not good reason for using 10 and not 5, then I suggest a deprecating cycle where we will change the default value max_airmass_absolute
from 10 to 5. For an example on this, then see lies 351-357 of the changes in #2094.
please go through the discussion on GH of the initial addition of the function
Looks to me that AM<10 is a choice that the author (M. Lee) made when implementing the model in code. Same limit is present in the Matlab function.
@RDaxini Could you paste the quote from the paper where a limit of 5 is described?
@RDaxini Could you paste the quote from the paper where a limit of 5 is described?
Image 1 (above): pw and am filter requested @AdamRJensen (extracted from Section II of the manuscript)
Image 2 (below): just some other filters/assumptions in the paper, not sure if relevant but when talking about ensuring the implementation in pvlib is true to the original model, it got me thinking about what other limits there are. How do we decide which limits are implemented in pvlib? This is just a general question from me as a new contributor, I want to understand the thinking behind this a bit more
@RDaxini I read it as simulations were done for 1 to 5 AM, and NOT as an explicit statement that the model is valid for 1 to 5 AM. Therefore, I think it's probably ok to keep the current upper limit of 10. However, I still think there's benefit to adding the parameter to the function.
The regression was performed on the simulations with AM<5. In M. Lee's judgement, its extrapolation to 5<AM<10 was considered reasonable. I don't know if the extrapolation is specifically described in the paper.
The second excerpt reads (to me) like a description of how they cleaned up measurements to either fit or verify the model.
@AdamRJensen agreed, it does not state that it is a specific range of model validity, nowhere in the paper are there any specific statements pertaining to model validity within any particular range (as far as I see...). @cwhanse unless I missed it, the extrapolation is not described in the paper (neither is the upper pw limit extrapolation) So, I was not sure how these upper limits were decided or linked to the paper to be justified as part of the implementation.
Based on this discussion, I think for now I will create a simple PR to add the parameter into the function and revise the docs accordingly.
What are your thoughts on the handling of inputs exceeding the limits --- return np.nan
as is currently the case for exceeding max_precipitable_water
, or stick with setting the user input to equal the specified parameter value, as is currently the case with min_precipitable_water
?
I vote
stick with setting the user input to equal the specified parameter value, as is currently the case with min_precipitable_water?
Created PR #2100
Good and important observations and discussion. Recognizing and managing the limits of model validity and validation are high on my priority/wish list.
The clipping of pwat at 0.1 (not 8.0) and airmass at 10 was also done in a spreadsheet FS once distributed, so it was presumably also done during model validation. We don't know whether any data points were affected by pwat clipping in the validation data sets, so we don't know whether this behaviour is truly validated. I would consider the clipping/extrapolation to be part of the model.
Is your feature request related to a problem? Please describe.
pvlib.spectrum.spectral_factor_firstsolar
contains data screens forprecipitable_water
andairmass_absolute
that change input data. Max/min values can be set forprecipitable_water
by the user but notairmass_absolute
.I see two issues:
airmass_absolute
.Describe the solution you'd like I can think of several potential solutions but would like community feedback on what should be implemented. Based on the feedback, I can create a PR with proposed changes. Current ideas:
airmass_absolute
andprecipitable_water
max_airmass_absolute
andmin_airmass_absolute
) topvlib.spectrum.spectral_factor_firstsolar
so that the user can define min/max values forairmass_absolute
in the same way they can definemax_precipitable_water
andmin_precipitable_water
, and their input data will be changed accordingly.Additional context I understand this issue touches on a question about the role of pvlib as a toolbox and to what extent it should hold the user's hand through the application of its functions. I touched on the subject in this informal blog post where I used the same spectral factor function as an example. At the very least, I don't think we should be changing user data without letting the user control this change. Going a step further, I don't think this function should include within it an action to change user data. It's job is to calculate the spectral mismatch factor given a set of
airmass_absolute
andprecipitable_water
values. I see room for including warnings to advise the user.