pvlib / pvlib-python

A set of documented functions for simulating the performance of photovoltaic energy systems.
https://pvlib-python.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.16k stars 980 forks source link

consistent variable names and capitalization #37

Closed wholmgren closed 9 years ago

wholmgren commented 9 years ago

Let's try to agree on the variable names and capitalization rules for all terms in the library. PEP8 is a good place to start, but it may be reasonable to break those rules for some terms.

I don't expect that we can come to agreement on everything in this issue, but let's at least write down the terms and rules for which we do agree and change the code to reflect whatever consensus emerges.

A wiki would probably be a good place to put the results.

In no particular order, here are some problems that I'm aware of:

Fixing this without breaking things requires good tests.

uvchik commented 9 years ago

I would use existing conventions like PEP8 because this is easier for new contributors. For variable names I like the convention table of pylint: http://pylint-messages.wikidot.com/messages:c0103

Pylint is a program to check your code. It marks all lines where you broke the rules. You can integrate pylint in many IDEs or editors (http://docs.pylint.org/ide-integration) or use it as a standalone tool to check your modules.

Breaking the rules makes it more difficult for contributors from other projects.

But it's your project, if the convention is clear I will keep to the code.

wholmgren commented 9 years ago

I'll put a few of my preferences here, going from perhaps least to most controversial.

Everything that is not an acronym or a complicated variable in an equation should be lower case. This is just good PEP8, and most of the code actually follows this already. Of the list above, the only change would be that In_Plane_SkyDiffuse becomes in_plane_skydiffuse or similar.

Airmass should just be called airmass since it's one word and it's not that long.

Up to about 10 characters, I prefer longer names to shorter names. So, I prefer to spell out words like latitude and longitude.

I like diffuse_ground over e.g. diff_gr, but I like surf_az over surface_azimuth. I can't justify this.

I would make most (all?) acronyms lowercase, including dni, ghi, aoi, tz, etc.

Use of terms like V_mp should minimized, but it's probably unavoidable for complicated variables in complicated equations and I'm not sure what the appropriate rules should be.

We may also consider that rules for parameters and returned values could be different than rules for the code. Abbreviations and acronyms can make code more readable.

@uvchik It's very much a community project, a few of us just have push privileges.

bmu commented 9 years ago

It would be great, if we could agree on naming conventions! And we could have something like a table of definitions in the wiki or documentation. In a first step these definitions coul be used by developers.

As an idea, they could possibly also be used by users in the future, if we would switch to dataframe inputs for some functions or a different kind of api. This would enable us to include som e "magic" functions, e.g the signature of the ominous globalinplane function could look like this:

def globalinplane(df, surface_tilt, surface_azimuth, diffuse_model='perez', 
                  decomposition_model=None):
    """Determine GPOA from either GHI and DHI or from GHI only

    Parameters
    ---------------
    df : pandas.DataFrame
          A DataFrame containing all necessary columns acoording to naming conventions
          (maybe surface_tilt and surface_azimuth could also be contained in the df, 
           e.g. for tracking systems.) 
    decomposition_model : None or str
          The model to use if only GHI is given in the DataFrame
    ....
    Returns
    -------
    The input DataFrame plus columns `direct tilted`, 'diffuse tilted`, `GPOA` ...
    """"

This function could look which columns are in the DataFrame and compute all necessary columns (e.g. if time is given as local time, compute utc or true solar time).

From my experience this is usefull for beginners because there are quite a lot of simulation steps required to calculate GPOA from GHI (maybe convert times, decomposition in direct and diffuse, diffuse model, ground reflection, direct tilted, ...?). And there are other options, e.g. claculate expected energy yield from only a system description, Location and a DataFrame containing GHI and ambient temperature.

This is just an idea, not sure about the difficulties. There may be some complexity when implementing something like this (more on the definitions, not necessarily from a programmers point of view) and maybe this is difficult to explain to users.

bmu commented 9 years ago

For pylint we could include a pylintrc in the repository (maybe also for other code checkers like pep8, pyflakes ... or decide which one we want to use).

wholmgren commented 9 years ago

Yes, I would really like to see some new high level functions that utilize the API and standardizing names will make this much easier. I'll suggest a new issue(s) for scoping out those functions, though.

I'm ok with adding a pylintrc or similar file, but I want to be pragmatic about this. I think that using good PEP8 practices for API-relevant things is a lot more important than removing extra spaces from the end of lines or putting spacings between every operation or after item in list (which can often reduce readability, in my opinion). I think that the important PEP8 rules are too often drowned out by many silly violations. Maybe you all have had better experiences and can show me the way.

uvchik commented 9 years ago
  1. I agree with @wholmgren, that it is easier to read the code if the variables are spelled out. Abbreviations are okay for often used phrases like diffuse but we should avoid them. Standard acronyms are okay for me. Maybe we could create a table with the most important variable names, so that we and all new developers get an overview of how the naming works. Example: Variables-and-style-rules
  2. I really like the rules, even though my own rules would be different. But everybody has his own preferences and discussions about these rules can be very long. I my experience it is very useful to follow PEP8, because you get used to that style. After a while it was a lot easier for me to read foreign PEP8-code then non-PEP8-code. The less exceptions, the less discussions, the less confusion.
  3. I definitely like the idea of @bmu to write a magic globalinplane function, but could you create a new issue (feature) for that.
wholmgren commented 9 years ago

The table is a good start. Can you keep going with your specific suggestions for the rest of the parameters discussed above? I'll add mine after that. I think we all agree that we should follow pep8 but that still leaves room to differ on some parameters.

robwandrews commented 9 years ago

Hi Folks, I agree with the conversation above, and think that for the majority of the time we should be sticking to PEP8.

However, another topic came into this thread, which is to look at implementing a dataframe input method for the functions. This is a topic that came up earlier in the project, that we decided to steer away from, though it could be revisited. I've opened up a new issue to talk about this #39

uvchik commented 9 years ago

I think the variables should be lower case but I'm not sure with the keys of the DataFrame. They could be camel case. It might make things even clearer in the documentation but I'm not sure. I updated the wiki. What do you think? Please add or comment.

dacoex commented 9 years ago

a question: Alternatively to the wiki the variables could be added to the documentation or dedicated module where variables will be defined as a dictionary or table...

Offeres the possibility to use it in the documentation or even the code.

wholmgren commented 9 years ago

@uvchik The wiki looks nice, and I'll add some comments there. I would say that we should try to be consistent between function parameters and the DataFrame keys. I think most people would wonder why the function outputs are formatted differently than the inputs, but I could be wrong about that. Trying to stay consistent also makes it possible to use the **DataFrame technique discussed in #39.

@dacoex I don't fully understand your comment, but we can add a table of variable definitions to the sphinx documentation. Let's wait to come to agreement on most of them before doing this.

wholmgren commented 9 years ago

Here's a PVPMC page that's worth reviewing, although I do not suggest that we follow these definitions for our code.

uvchik commented 9 years ago

I removed the 'key' column. So variable names and key names will be the same.

We have to decide whether we want the 'e_*' or not.

I totally agree that the result of this discussion should be added to the documentation.

@wholmgren This is a basic decision. Do we want variable names that are close the symbols used in equations or do we want longer "talking" variable names. I prefer the longer names.

wholmgren commented 9 years ago

I also prefer the longer names. On Wed, Mar 25, 2015 at 3:12 AM ukrien notifications@github.com wrote:

I removed the 'key' column. So variable names and key names will be the same.

We have to decide whether we want the 'e_*' or not.

I totally agree that the result of this discussion should be added to the documentation.

@wholmgren https://github.com/wholmgren This is a basic decision. Do we want variable names that are close the symbols used in equations or do we want longer "talking" variable names. I prefer the longer names.

— Reply to this email directly or view it on GitHub https://github.com/pvlib/pvlib-python/issues/37#issuecomment-85969916.

jforbess commented 9 years ago

I really dislike e_* form of irradiance names. e doesn't mean irradiance to me, it means energy generated with volts and amps.

I'm a lazy typist, but there are so many common notations for irradiance, i think it is probably better to write them out. Me, I prefer ghi, dhi, dni, poa_global, poa_beam, poa_diffuse but could see spelling out the abbreviations if the consensus went that way.

On Wed, Mar 25, 2015 at 6:45 AM, Will Holmgren notifications@github.com wrote:

I also prefer the longer names. On Wed, Mar 25, 2015 at 3:12 AM ukrien notifications@github.com wrote:

I removed the 'key' column. So variable names and key names will be the same.

We have to decide whether we want the 'e_*' or not.

I totally agree that the result of this discussion should be added to the documentation.

@wholmgren https://github.com/wholmgren This is a basic decision. Do we want variable names that are close the symbols used in equations or do we want longer "talking" variable names. I prefer the longer names.

— Reply to this email directly or view it on GitHub https://github.com/pvlib/pvlib-python/issues/37#issuecomment-85969916.

— Reply to this email directly or view it on GitHub https://github.com/pvlib/pvlib-python/issues/37#issuecomment-86031950.

wholmgren commented 9 years ago

I like poa*. Feel free to edit/add to the wiki.

uvchik commented 9 years ago

I would add a "rad" if an angle is in radiant, but I wouldn't add a "deg" for degree.

wholmgren commented 9 years ago

@uvchik that seems reasonable. Do we have any pvlib functions that use radians for inputs and outputs?

wholmgren commented 9 years ago

The updated table looks good. I like the longer dni_extra, the item-modifier pattern used in airmass_[relative|absolute] (most other variables are now listed this way), and the solar_zenith since we use "solar" more often than "sun".

wholmgren commented 9 years ago

If you're only seeing this via email, now is a good time to look at @uvchik's updated table and let us know what you think. @uvchik and I have dominated the discussion here, but I don't want all of pvlib-python's naming conventions to be determined by our preferences.

bmu commented 9 years ago

I agree with your conventions, especially to use longer "talking" names. Here are my thoughts on some of the the bold ones (I have no preference for the other bold ones) in the wiki page and some other comments:

tz or timezone

I am not sure, if this is a good name in general, as we can have time stamps given as UTC, really related to a time zone, local time (without DST), mean solar time, true solar time, ...? This is not meant as a philosophical comment, but is an everyday problem from my experience, so we should be able to handle all of these formats (at least in the future, for now maybe it is ok to define, that all times should be given as UTC). Maybe time_reference (it should be decided by a native speaker ;-)?

dni_et

I like extra, should be used as index for all extraterrestrial irradiances. not sure if "direct" is a good name for extraterrestrial irradiance. However it is important to separate "normal", "horizontal" or "poa" extraterrestrial for some models (this was often a source of error for me, because I didn't cached which one was meant in the models).

global, direct and diffuse in general

Maybe it is better to have a more general rule, e.g to to use g (global), d (diffuse) and b (direct or beam) and than an index. Indices could be normal, hor (horizontal), tilted and poa (which is a special case, if it is not clear / not important if it is tilted or normal, e.g. for single axis tracking). However, I am not sure on this rule, because the names would be different from the usual ones like DNI (would be b_normal) , GHI (would be g_hor), ... (but it would be more systematic).

uvchik commented 9 years ago

I prefer to use the common names like dni and ghi and beside that use systematic names like poa_diffuse, poa_global. But i could live with both variants.

I don't know how to bring about a decision.

wholmgren commented 9 years ago

tz and timezone: There was a lot of discussion in #47 concerning how pvlib should handle time zones and/or time references. I believe that the conclusion is that it's ok to rely heavily on the pandas functionality and thus keep using the idea of timezones instead of inventing a new concept such as a time_reference (at least for now). So, we are back to deciding on tz vs timezone and, unless there are objections, we are going to go with tz since that's what pandas uses.

I appreciate @bmu's desire to make the ghi, dni, and so on more systematic, but I agree with @uvchik's preference for common names.

I moderately support @jforbess's proposal for v_mpp, i_oc etc. since they're so commonly used and often so long otherwise. Perhaps non-native english speakers have other abbreviations and would prefer longer, more explicit names?

bmu commented 9 years ago

English abbreviations are common even for non-native speakers (at least in Germany), I think. For the radiation names, the only really common names are GHI and DNI (or ghi and dni). v_mpp, p_mpp and so on, sounds good to me.

wholmgren commented 9 years ago

Made some small updates to the wiki. We'll go with @jforbess's abbreviated names.

wholmgren commented 9 years ago

Added more parameters. I'm open to last-minute appeals for cell/module parameters: Voc vs. voc vs. v_oc or Rs vs. rs vs. r_s etc. The only thing I don't like is V_oc since that requires using the shift key twice.

@jforbess you previously had a strong opinion about e_* for irradiance names. What about for effective irradiance?

jforbess commented 9 years ago

I prefer poa_beam_eff, rather than eff_poa_beam, if that's the question. I haven't thought hard about how it flows through modeling, but I would want access to poa adjusted for shading, soiling, IAM, and spectrum each individually for model development, don't you think? Not sure if you consider shading and soiling as appropriate to adjust poa in this way, because I think it makes some sense physically, but not completely.

PVsyst supplies this through a ratio of Shd_Loss_Factor or similar, rather than providing POA_adj for each factor. Though I guess they also do provide lost kWh due to most of those factors.

On Mon, Jun 22, 2015 at 11:25 AM, Will Holmgren notifications@github.com wrote:

Added more parameters. I'm open to last-minute appeals for cell/module parameters: Voc vs. voc vs. v_oc or Rs vs. rs vs. r_s etc. The only thing I don't like is V_oc since that requires using the shift key twice.

@jforbess https://github.com/jforbess you previously had a strong opinion about e_* for irradiance names. What about for effective irradiance?

— Reply to this email directly or view it on GitHub https://github.com/pvlib/pvlib-python/issues/37#issuecomment-114170926.

wholmgren commented 9 years ago

Sorry, I was thinking of it in the context of the SAPM's effective irradiance parameter.

bmu commented 9 years ago

I think this issue can be closed. The last one for 0.2 so far.

wholmgren commented 9 years ago

I'll close it with the expectation that it will be reopened or a similar issue will be created for undiscovered inconsistencies.

dacoex commented 8 years ago

Just a quick question:

for the ration poa_global to ghi do you prefer: _Transposition factor (PVSyst Glossary )_ or poa_gain

cwhanse commented 8 years ago

Transposition factor

From: DaCoEx [mailto:notifications@github.com] Sent: Tuesday, January 05, 2016 6:52 AM To: pvlib/pvlib-python Subject: [EXTERNAL] Re: [pvlib-python] consistent variable names and capitalization (#37)

Just a quick question:

for the ration poa_global to ghi do you prefer: Transposition factor (PVSyst Glossary )http://files.pvsyst.com/help/transposition_factor.htm or poa_gain

— Reply to this email directly or view it on GitHubhttps://github.com/pvlib/pvlib-python/issues/37#issuecomment-169006234.

wholmgren commented 8 years ago

I also prefer transposition factor.

dacoex commented 8 years ago

Interesting. I see too much risk of mixing its abbreviation (TF) with thin-film.

Shall we add it to the list, then?

wholmgren commented 8 years ago

We try to avoid abbreviations in pvlib, so I don't think it's a problem. There is a function in irradiance.py that would need updating. It currently uses poa_ratio.

dacoex commented 8 years ago

ok, but it's a very long column name either ' transposition factor'

wholmgren commented 8 years ago

We have a lot of long variable/column names. Only the most obvious quantities are abbreviated. Most (all?) of the people that originally participated in this discussion supported longer names over abbreviations.

dacoex commented 8 years ago

OK, so assign it to me and I will add it when finalising https://github.com/pvlib/pvlib-python/pull/103