Design pvlib.io - Githubissues

pvlib / pvlib-python

A set of documented functions for simulating the performance of photovoltaic energy systems.

https://pvlib-python.readthedocs.io

BSD 3-Clause "New" or "Revised" License

1.21k stars 1.01k forks source link

Design pvlib.io #261

Closed cwhanse closed 6 years ago

cwhanse commented 8 years ago

I'm assembling code to create pvlib.io.

What functionality is desired? Specifically, from which formats do we want to read, and to which formats do we want to write?

I'm going to focus first on reading a few data formats. I plan to incorporate the current pvlib.tmy which reads tmy2 and tmy3 files, returning a tuple (data, metadata) where data is a pandas DataFrame and metadata is a dict. Any comments on this design? Personally I like it.

I have code to read surfrad files so I'll include that.

adriesse commented 8 years ago

I often put metadata in a pandas Series rather than dict because this gives me order. I also like the dot-style access, e.g. "meta.latitude".

wholmgren commented 8 years ago

As @bmu once mentioned, naming our new module io is probably asking for trouble because it conflicts with the builtin python io module. I'm open to dataio, datareader, inputoutput, or other suggestions.

So far, we've only discussed creating a new module. @mikofski already complained in #235 that our modules are too long, and adding a bunch of new code to a unified io module might make it worse. I wonder if we might be better off in the long run if we instead create a subpackage comprised of one module per data type and pull the functions into the package-level namespace. This is similar to the way that the pandas io package is structured. So, we'd have something like

pvlib/
  dataio/
    __init__.py
    surfrad.py
    tmy.py
    ...

### pvlib/dataio/__init__.py ###
from pvlib.dataio.surfrad import read_surfrad
from pvlib.dataio.tmy import read_tmy2, read_tmy3

### usage ###
import pvlib
data, metadata = pvlib.dataio.read_surfrad
data2, metadata2 = pvlib.dataio.read_tmy2

This subpackage structure might make for an easy to use API and a set of easy to read/maintain modules.

Probably a good idea to stick with the (data, metadata) pattern for new reader functions.

I cannot think of a downside to returning metadata as a Series, but my instinct is to leave it as a dict (possibly an OrderedDict) and let users convert it to a Series if they want to. I seem to remember that the pandas developers recommend against using Series as a dictionary, though I can't find the reference and could be making that up. All that being said, I'm not opposed to the change.

cwhanse commented 8 years ago

It’s a plan, although I prefer iotools to dataio.

Function pattern to be the same as pandas, i.e., read(format) and to(format) as needed.

adriesse commented 8 years ago

I defer to people with more experience on the code organization, but I do suspect there will be a few potentially shared elements. What I could imagine--and would really like to have--is a kind of enhanced read_csv that can read extra information in the header and/or footer in addition to the regular columnar data. Many file formats could build on that just by setting parameters and translating column or key names. An option could be to return the header and/or footer as a text block to be parsed later.

dacoex commented 8 years ago

@adriesse do you have an example file & code?

How would you implement with the file irradiation-0e2a19f2-abe7-11e5-a880-5254002dbd9b.csv

In most cases, the input will requrie an adaption of the dateparser to the specific data format. This usually gets you going. Or would the scope in addition include quality checks or reformatting on the data? I am still not clear what the output is desired to be.

A dataframe as read in?
A trying to comply closest with Variables and Symbols?

In the latter case, what do we with additional variables e.g. in spectral datasets or in BSRN files, for example?

mikofski commented 8 years ago

Hi all, sorry I haven't chimed in. I already have tmy2 and tmy3 readers. I'll send them when I get in.

AFA the dataio module and namespace question.

I also like dataio.
Will, I like your idea of sub packages, I think it's the easiest way to decrease the omnipotence of the god modules. Just change irradiance.py to a folder called irradiance and put an __init__.py in it. Then import the symbols that were originally imported directly from irradiance.py but are now in modules like irradiance/spa.py
Anton, clashes in names should be rare in Python because it uses namespaces. EG pvlib.data.io can't be confused with the built-in io because the long name that precedes it.

wholmgren commented 8 years ago

Mark, I am confused about your tmy readers. Do they do the same thing as pvlib's existing tmy readers?

Let's take things one step at a time and have Cliff just make us a dataio/iotools subpackage for the IO capabilities. I have been thinking about larger reorganizations, though, will eventually comment more in #235.

If necessary, the subpackage could have a core.py or its own tools.py module for code that is reused throughout the package.

I want to keep the bar for contributing to pvlib as low as possible, so it's fine with me if different modules do different amounts of processing to their respective data formats. Some might be 1000 line monsters that do QA and return data that is in "pvlib conventions," while others might not do much more than call pd.read_csv with a couple of arguments that are unique to that data format.

mikofski commented 8 years ago

RE: TMY

@wholmgren , sorry I misunderstood the intention of the proposed dataio, and also I wasn't aware of the readtmy2 and readtmy3 methods in tmy.py. So please disregard (2) in my comment above.

RE: reorganization

I agree with you that the bar for contributions should be low. Your ideas for core.py and tools.py sound good, although if I understand correctly (?) that approach may break some code if it shift things from say pvlib.irradiance to pvlib.core.irradiance so that's something to consider. IMHO changing the top level modules to subpackages would preserve the current API, but allow more flexibility. See my comment on #235

RE: `dataio`

Cliff, some ideas for formats that could be read are:

SAM module and inverter Excel spreadsheets - I think this is already covered by pvsystem.retrieve_sam method, so similar to TMY methods, would you relocate them to dataio?
NSRDB API for weather data
If I can manage to spend a little time to upload the rest of the module and inverter data to pvfree it would be nice to supplement retrieve_sam.
PAN and ONB file reader, although I don't know if there are any methods in pvlib currently that would utilize these, I know a lot of people would appreciate being able to load these into Python for use with custom tools.
Other data sources of irradiance, atmospheric and weather measurements would be very useful. These come from numerous sources, so maybe they could be contributed by users as needed. Here's a short list of some examples:
- AERONET
- MERRA
- ECMWF
- Solar Prospector
- Clean Power Research
- 3-Tier Vaisala
- Solar GIS
- Meteonorm/Meteotest
- SoDa-Pro

dacoex commented 8 years ago

@cwhanse / @mikofski many items of the last comments are already in https://github.com/pvlib/pvlib-python/issues/29 are they complementary? shall the other be closed?

I wonder if there are any observations regarding the comments in https://github.com/pvlib/pvlib-python/issues/261#issuecomment-260628285

Otherwise, who will provide a prototype or will have a PR each one with the data set reader they have implemented?

dacoex commented 7 years ago

@cwhanse please have a look at the PR

The data is here: cd MY-GITHUB-CLONES git clone https://github.com/dacoex/pvlib_data

dacoex commented 7 years ago

@cwhanse & @adriesse my refractored PR is up:

maccrad https://github.com/pvlib/pvlib-python/pull/279
pvsyst https://github.com/pvlib/pvlib-python/pull/280

@wholmgren seems to lead towards keeping the iotools simple and as a customised wrapper for pandas.read_csv

Shall iotools/util also provide functions to:

read metadata, usually in the lines before column header: coordinates, time reference?
localise the UTC based timeseries as suggested: 4 days ago

So what shall be the outputs?

a dataframe with the raw data?
a dataframe with renamed columns to match pvlib convention?
a metadata dictionary?
a location with timezone, which involves usually retrieving this info either from geonames or using an additional package.
a dataframe with renamed columns to match pvlib convention & localised index?

I personally prefer to have the library do as much as possible:

read data
reformat data
prepare a Location
localise data

I am looking forward to your feedback & will then modify my PRs according to what seems to be consensus.

jforbess commented 7 years ago

I understand why you would like to handle the metadata with this library.

I wonder if the problem is that not all file types include all metadata? For example the PVsyst output file doesn't include any timezone or lat/long information, because it just references a .SIT file that includes that information. Not sure how to handle this other than to have the user define a Location separately to use.

Jessica

On Mon, Dec 5, 2016 at 8:57 AM, DaCoEx notifications@github.com wrote:

@cwhanse https://github.com/cwhanse & @adriesse https://github.com/adriesse my refractored PR is up:

maccrad #279 https://github.com/pvlib/pvlib-python/pull/279

pvsyst #280 https://github.com/pvlib/pvlib-python/pull/280

@wholmgren https://github.com/wholmgren seems to lead towards keeping the iotools simple and as a customised wrapper for pandas.read_csv

Shall iotools/util also provide functions to:

read metadata, usually in the lines before column header: coordinates, time reference?

localise the UTC based timeseries as suggested: 4 days ago https://github.com/pvlib/pvlib-python/pull/270#issuecomment-264102908

So what shall be the outputs?

a dataframe with the raw data?

a dataframe with renamed columns to match pvlib convention?

a metadata dictionary?

a location with timezone, which involves usually retrieving this info either from geonames or using an additional package.

a dataframe with renamed columns to match pvlib convention & localised index?

I personally prefer to have the library do as much as possible:

read data

reformat data

prepare a Location

localise data

I am looking forward to your feedback & will then modify my PRs according to what seems to be consensus.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pvlib/pvlib-python/issues/261#issuecomment-264910082, or mute the thread https://github.com/notifications/unsubscribe-auth/AH66AcfaCRnSZDxoPpBb9fNqf6VM-NVaks5rFEKIgaJpZM4KweNy .

dacoex commented 7 years ago

I wonder if the problem is that not all file types include all metadata?

Yes. But most scientific providers at least include information on time reference, i.e. UTC. So with this and the coordinates, we could derive the timezine and calculate the location for the typical input meto files.

Timezone is derived either by web query to geonames or local libraries.

For example the PVsyst output file doesn't include any timezone or lat/long information, because it just >references a .SIT file that includes that information. Not sure how to handle this other than to have the >user define a Location separately to use.

Correct. Actually, in the case of PVSyst, I would assume that a location exists because PVSyst hourly output usually does not include GHI. I would assume that a pvlib user uses this data to compare the result of both modelling environments.

So my current proposal for addressing this in iotools:

Standard output of a pvlib.iotools.FORMAT reader would be a tuple with
- dataframe with raw data (for comparison & debugging)
- dataframe with renamed columns to match pvlib convention
- metadata dictionary as suggested by @wholmgren in iotools: reader for maccrad #279
pvlib.iotools.util to include some tools
- optional tool to retrieve the timezone
- optional tool to localise dataframe
- further functions, e.g. checker if the radiation starts before sunrise to inform that there is a timezone issue

So depening on the dataset, the user coud employ the functions in util for more convenience. The capabilities of each format / reader could be documented in the docstring.

jforbess commented 7 years ago

Yes, having tools available for the user to apply sounds like the right approach.

On Tue, Dec 6, 2016 at 2:05 AM, DaCoEx notifications@github.com wrote:

I wonder if the problem is that not all file types include all metadata?

Yes. But most scientific providers at least include information on time reference, i.e. UTC. So with this and the coordinates, we could derive the timezine and calculate the location for the typical input meto files.

Timezone is derived either by web query to geonames or local libraries.

For example the PVsyst output file doesn't include any timezone or lat/long information, because it just >references a .SIT file that includes that information. Not sure how to handle this other than to have the >user define a Location separately to use.

Correct. Actually, in the case of PVSyst, I would assume that a location exists because PVSyst hourly output usually does not include GHI. I would assume that a pvlib user uses this data to compare the result of both modelling environments.

So my current proposal would be address in iotools:

Standard output of a pvlib.iotools.FORMAT reader would be a tuple with

dataframe with raw data (for comparison & debugging)

dataframe with renamed columns to match pvlib convention

metadata dictionary as suggested by @wholmgren https://github.com/wholmgren in iotools: reader for maccrad #279 https://github.com/pvlib/pvlib-python/pull/279#discussion_r90881672

pvlib.iotools.util to include some tools

optional tool to retrieve the timezone

optional tool to localise dataframe

further functions, e.g. checker if the radiation starts before sunrise to inform that there is a timezone issue

So depening on the dataset, the user coud employ the functions in util for more convenience. The capabilities of each format / reader could be documented in the docstring.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pvlib/pvlib-python/issues/261#issuecomment-265110051, or mute the thread https://github.com/notifications/unsubscribe-auth/AH66AdRkKTYFZ3SIqrAi3gUMElcw5CQXks5rFTOAgaJpZM4KweNy .

mikofski commented 7 years ago

try tzwhere to determine timezone from coordinates:

from tzwhere import tzwhere
from datetime import datetime
import pytz

# timezone lookup, force nearest tz for coords outside of polygons
WHERETZ = tzwhere.tzwhere(shapely=True, forceTZ=True)
# daylight savings time (DST) in northern hemisphere starts in March and ends
# in November and the opposite in southern hemisphere
JAN1 = datetime(2016, 1, 1)  # date with standard time in northern hemisphere
JUN1 = datetime(2016, 6, 1)  # date with standard time in southern hemisphere

# notes and links on Python datetime tzinfo dst:
# http://stackoverflow.com/questions/17173298/is-a-specific-timezone-using-dst-right-now
# http://stackoverflow.com/questions/19774709/use-python-to-find-out-if-a-timezone-currently-in-daylight-savings-time

# methods for determining DST vs. STD
# tz.localize(JAN1).timetuple().tm_isdst -> boolean, True if DST, False if STD
# tz.localize(JAN1).dst() -> timedelta, 0 if STD
# tz.dst(JAN1) -> timedelta, 0 if STD

def tz_latlon(lat, lon):
    """
    Timezone from latitude and longitude.

    :param lat: latitude [deg]
    :type lat: float
    :param lon: longitude [deg]
    :type lon: float
    :return: timezone
    :rtype: float
    """

    # get name of time zone using tzwhere, force to nearest tz
    tz_name = WHERETZ.tzNameAt(lat, lon, forceTZ=True)

    # check if coordinates are over international waters
    if not tz_name or tz_name in ('uninhabited', 'unknown'):
        # coordinates over international waters only depend on longitude
        return lon // 15.0
    else:
        tz_info = pytz.timezone(tz_name)  # get tzinfo

    # get the daylight savings time timedelta
    tz_date = JAN1  # standard time in northern hemisphere
    if tz_info.dst(tz_date):
        # if DST timedelta is not zero, then it must be southern hemisphere
        tz_date = JUN1  # a standard time in southern hemisphere
    tz_str = tz_info.localize(tz_date).strftime('%z')  # output timezone from ISO
    # convert ISO timezone string to float, including partial timezones
    return float(tz_str[:3]) + float(tz_str[3:]) / 60.0, tz_info

if __name__ == "__main__":
    # test tz_latlon at San Francisco (GMT+8.0)
    gmt, tz_info = tz_latlon(37.7, -122.4)
    assert gmt == -8.0
    assert tz_info.zone == 'America/Los_Angeles'
    assert tz_info.utcoffset(JAN1).total_seconds()/3600.0 == -8.0
    assert tz_info.utcoffset(JUN1).total_seconds()/3600.0 == -7.0
    # New_Delhi, India (GMT-5.5)
    gmt, tz_info = tz_latlon(28.6, 77.1)
    assert gmt == 5.5
    assert tz_info.zone == 'Asia/Kolkata'
    assert tz_info.utcoffset(JAN1).total_seconds()/3600.0 == 5.5
    assert tz_info.utcoffset(JUN1).total_seconds()/3600.0 == 5.5
    # also works with integers (EG: Santa Cruz, CA)
    gmt, tz_info = tz_latlon(37, -122)
    assert gmt == -8.0
    assert tz_info.zone == 'America/Los_Angeles'

You can call this like in the script:

gmt, tz_info = tz_latlon(37, -122)
# gmt: -8.0
# tz_info: <DstTzInfo 'America/Los_Angeles' LMT-1 day, 16:07:00 STD>

Also see: Difference between timezones America/Los_Angeles and US/Pacific and PST8PDT?

Warning Please do not use any of the timezones listed here in "other timezones" (besides UTC), they only exist for backward compatible reasons, and may expose erroneous behavior.

Note: Using Shapely adds some overhead, loading is slightly slower, but is more accurate especially near shorelines.

mikofski commented 7 years ago

to the reverse geopy with ESRI ArcGIS works great. Registration with ESRI developer or ArcGIS public account is free.

Mapquest also has free developer accounts and can be used with geopy.

There are also several other geocoding services like Google and others.

dacoex commented 7 years ago

@jforbess so I conclude you agree to the structure described above in issuecomment-265110051?

@mikofski Thanks for the code. I used a simpler library: iotools: reader for maccrad by dacoex · Pull Request #279 · pvlib/pvlib-python

But @wholmgren did not like the addition of an external dependency: issuecomment-264486119

This is why I propose above to make it optional.

Well, if there are no further ideas or suggestions, I will revise the PR according to the discussion.

jforbess commented 7 years ago

@dacoex, yes, and I see that @wholmgren didn't like an iotools/util function to localize tz, instead recommending using the pandas function, which I understand, but part of me thinks that having a function in the api will help users be consistent in their usage. This may be a philosophical argument?

On Tue, Dec 6, 2016 at 2:35 PM, DaCoEx notifications@github.com wrote:

@jforbess https://github.com/jforbess so I conclude you agree to the structure described above in issuecomment-265110051 https://github.com/pvlib/pvlib-python/issues/261#issuecomment-265110051?

@mikofski https://github.com/mikofski Thanks for the code. I used a simpler library: iotools: reader for maccrad by dacoex · Pull Request #279 · pvlib/pvlib-python https://github.com/pvlib/pvlib-python/pull/279/files/112b7bddc8e9ab554d01f8d79178d56d975e11b4#diff-88fd35c9724ff076feeb9cae8cebeeb9R10

But @wholmgren https://github.com/wholmgren did not like the addition of an external dependency: issuecomment-264486119 https://github.com/pvlib/pvlib-python/pull/274#issuecomment-264486119

This is why I propose above to make it optional.

Well, if there are no further ideas or suggestions, I will revise the PR according to the discussion.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pvlib/pvlib-python/issues/261#issuecomment-265295207, or mute the thread https://github.com/notifications/unsubscribe-auth/AH66AWrm97XkPsAeZ6GTW1woo3H2KgDIks5rFeMrgaJpZM4KweNy .

mikofski commented 7 years ago

@dacoex I like timezonefinder please use pypi reference not github.

I disagree with @wholmgren #274, IMO it's fine to have dependencies, as long as they are mature, well documented and widely used, which timezonefinder is, since it's on pypi, v1.5.7 has over 2000 downloads, it's based on tzwhere, the previous best choice, it's been recently updated and had a steady record of releases, etc.

IMHO using open source is one of the reasons to use Python. PyPI is one of the reasons Python is so powerful. Adding dependencies instead of rolling your own makes your code stronger.

Perhaps you can selectively let users use (or not) the import by putting it in a try: except: block.

Although, if possible perhaps it's good to try to make the dependency arbitrary, so you could switch it later. For example

def get _latlon(lat, lon, method, **kw):
    """Implemented timezone finder"""
    if method is None:
        try:
            import timezonefinder
        except:
            return lon // 15.0
        else:
            return timezonefinder(lat, lon, **kw)
    else:
        return method(lat, lon, **kw)

dacoex commented 7 years ago

Thanks to all for their feedback. I will propose an improved version and also add solargis format. Just give me some time because I just got loaded with a bit of work.

adriesse commented 7 years ago

Here is another thought:

There are actually two kinds of meta data in your maccrad file: per-file metadata and per-column meta data (in this case descriptions and units). I have seen this in some other file formats as well, and it could be something worth reading and returning separately.

wholmgren commented 7 years ago

So what shall be the outputs?

a dataframe with the raw data?

I don't know what you mean by raw data in this context. The columns are unchanged?

a dataframe with renamed columns to match pvlib convention?

Probably, though would be great if this was an option with a rename=True default. Not sure if we should also change the existing tmy readers to be more consistent.

a metadata dictionary?

readtmy2 and readtmy3 return a tuple of metadata, data. Seems reasonable to me. Has anyone been unhappy with this in the past or wished that those readers did more?

a location...

I strongly oppose this. I think it is essential that pvlib's class layer lives above pvlib's functional layer, though, I will take all the blame if the distinction or the reasons for it are unclear. Returning a Location object would ruin the distinction. As discussed elsewhere, you could instead add a Location.from_maccrad(metadata) method that does the job while retaining a clean separation among modules and between the class/functional layers. See Location.from_tmy for inspiration. Location.from_maccrad(file) could even be possible.

a dataframe with renamed columns to match pvlib convention & localised index?

I think that an IO reader should return a dataframe that is localized to the format of the timestamps of the file and/or the timezone specific metadata of the file. That is: if the timestamps say MST or -0700 then the dataframe should be localized accordingly, or, if the timestamps are ambiguous but the metadata says e.g. tz=MST then the dataframe should be localized. pvlib should not guess at the localization of the data.

Fine with me if you want to add a Location.guess_tz or similar method to Location. Within that method, you can import and run the optional library of your choosing or make a request to the google api.

pvlib.iotools.util to include some tools optional tool to retrieve the timezone

I can see an argument for Location.guess_tz() with no arguments since it would look up its own self.latitude and self.longitude. Otherwise, I think it's better for users to make their own call to a library that gets the tz with one simple line of code.

optional tool to localise dataframe

I'm confused about the arguments surrounding creating our own localize/convert functions. pandas has a tz_localize method and a tz_convert method. They do exactly what their names suggest. I don't know how we can improve upon that.

further functions, e.g. checker if the radiation starts before sunrise to inform that there is a timezone issue

Sounds complicated to do reliably and probably unnecessary.

dacoex commented 7 years ago

See Location.from_tmy for inspiration. Location.from_maccrad(file) could even be possible.

OK.

I don't know what you mean by raw data in this context. The columns are unchanged?

Yes, just the result of read_csv.

I can see an argument for Location.guess_tz() with no arguments since it would look up its own >self.latitude and self.longitude.

Where would that be placed? If put in location.py it would add more overhead to this module. If kept in iotools/util, this could be an optional module, i.e. not loaded on default by iotools/api.

I still prefer to include a few shortcut functions in iotools/util. At least those which make sense.

Otherwise, I think it's better for users to make their own call to a library >that gets the tz with one >simple line of code.

Even the reader can be done by the user. But isn't this here also to make life more simple? If we are guessing the tz for most of the files, this it could be generalised in a common function to reduce the number of copy&paste code.

I would prefer to let the software do whatever can be done automatically. The result is to be verified anyway.

pandas has a tz_localize method and a tz_convert method. They do exactly what their names >suggest. I don't know how we can improve upon that.

Well, I took mine from pandas doc. but maybe they'd changed that api again?

@adriesse I suggest to add the column metadata to the docstring as in https://github.com/pvlib/pvlib-python/blob/io/pvlib/iotools/tmy.py#L73 This is also because it is not used for further calculations

Summarising, the next revision will be structured more closely like the existing tmy readers. This was totally overlooked in my initial code. Sorry if that had spurred an unnecessary discussion. But seems that it helped to get a common about this module.

As a learning, we may add a specification and some instructions on how to add a new reader to the docs.

jforbess commented 7 years ago

On Wed, Dec 7, 2016 at 8:31 AM, Will Holmgren notifications@github.com wrote

further functions, e.g. checker if the radiation starts before sunrise to inform that there is a timezone issue

Sounds complicated to do reliably and probably unnecessary.

This may be unnecessary for standard files, but I have been wanting it for all of the data that I get from SCADA systems. Just found a system that didn't apply Daylight Savings on the actual day, but a week early. But only in the spring. And only the first two years. Otherwise, it had the right alignment with Daylight Savings.

I spent a lot of time wrapping my head around the right way to handle timezones because of daylight savings and the fact that my data sometimes comes with timestamps from a timezone that it is not located in. (A client pulls data in US/Eastern for a plant that is in California. The SCADA thinks it is doing the right thing, maybe, because it is relative to where the data is being queried, but it is not the right thing at all.)

But I admit, this shouldn't be an issue for any standard file that gets a standard reader. But it is critical if anyone tries to generalize iotools for a somewhat standard csv.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pvlib/pvlib-python/issues/261#issuecomment-265496432, or mute the thread https://github.com/notifications/unsubscribe-auth/AH66ATjzki4jLt5hI3u28wrM7Nxkl5plks5rFt9ngaJpZM4KweNy .

dacoex commented 7 years ago

yes, @jforbess is right. this also applies to most datalogger files...

squoilin commented 7 years ago

This issue seems a bit outdated, but anyway, I have just written a short function to read epw weather files from EnergyPlus. Note that the output is not harmonized with the output of the TMY functions. Feel free to re-use or integrate into the library: `

def readepw(filename=None):
    '''
    Reads an EPW file into a pandas dataframe.

    Function tested with EnergyPlus weather data files: 
    https://energyplus.net/weather

    Parameters
    ----------
    filename : None or string
        If None, attempts to use a Tkinter file browser. A string can be
        a relative file path, absolute file path, or url.

    Returns
    -------
    Tuple of the form (data, metadata).

    data : DataFrame
        A pandas dataframe with the columns described in the table
        below. 

    metadata : dict
        The site metadata available in the file.

    Notes
    -----

    The returned structures have the following fields.

    =======================================================================
    Data field                       
    =======================================================================
    Datetime data
    Dry bulb temperature in Celsius at indicated time
    Dew point temperature in Celsius at indicated time
    Relative humidity in percent at indicated time
    Atmospheric station pressure in Pa at indicated time
    Extraterrestrial horizontal radiation in Wh/m2
    Extraterrestrial direct normal radiation in Wh/m2
    Horizontal infrared radiation intensity in Wh/m2
    Global horizontal radiation in Wh/m2
    Direct normal radiation in Wh/m2
    Diffuse horizontal radiation in Wh/m2
    Averaged global horizontal illuminance in lux during minutes preceding the indicated time
    Direct normal illuminance in lux during minutes preceding the indicated time
    Diffuse horizontal illuminance in lux  during minutes preceding the indicated time
    Zenith luminance in Cd/m2 during minutes preceding the indicated time
    Wind direction at indicated time. N=0, E=90, S=180, W=270
    Wind speed in m/s at indicated time
    Total sky cover at indicated time
    Opaque sky cover at indicated time
    Visibility in km at indicated time
    Ceiling height in m
    Present weather observation
    Present weather codes
    Precipitable water in mm
    Aerosol optical depth
    Snow depth in cm
    Days since last snowfall
    Albedo
    Liquid precipitation depth in mm at indicated time
    Liquid precipitation quantity
    =======================================================================

    ===============   ======  ===================
    key               format  description
    ===============   ======  ===================
    altitude          Float   site elevation
    latitude          Float   site latitudeitude
    longitude         Float   site longitudeitude
    Name              String  site name
    State             String  state
    TZ                Float   UTC offset
    USAF              Int     USAF identifier
    ===============   ======  ===================

    S. Quoilin, October 2017
    '''

    def _interactive_load():
        import Tkinter
        from tkFileDialog import askopenfilename
        Tkinter.Tk().withdraw() #Start interactive file input
        return askopenfilename()

    if filename is None:
        try:
            filename = _interactive_load()
        except:
            raise Exception('Interactive load failed. Tkinter not supported on this system. Try installing X-Quartz and reloading')

    head = ['dummy0', 'Name', 'dummy1', 'State', 'dummy2', 'USAF', 'latitude', 'longitude', 'TZ', 'altitude']

    csvdata = open(filename, 'r')

    # read in file metadata
    temp = dict(zip(head, csvdata.readline().rstrip('\n').split(",")))

    # convert metadata strings to numeric types
    meta = {}
    meta['Name'] = temp['Name']
    meta['State'] = temp['State']
    meta['altitude'] = float(temp['altitude'])
    meta['latitude'] = float(temp['latitude'])
    meta['longitude'] = float(temp['longitude'])
    meta['TZ'] = float(temp['TZ'])
    meta['USAF'] = int(temp['USAF'])

    headers = ["year","month","day","hour","min","Dry bulb temperature in C","Dew point temperature in C","Relative humidity in percent","Atmospheric pressure in Pa","Extraterrestrial horizontal radiation in Wh/m2","Extraterrestrial direct normal radiation in Wh/m2","Horizontal infrared radiation intensity in Wh/m2","Global horizontal radiation in Wh/m2","Direct normal radiation in Wh/m2","Diffuse horizontal radiation in Wh/m2","Averaged global horizontal illuminance in lux during minutes preceding the indicated time","Direct normal illuminance in lux during minutes preceding the indicated time","Diffuse horizontal illuminance in lux  during minutes preceding the indicated time","Zenith luminance in Cd/m2 during minutes preceding the indicated time","Wind direction. N=0, E=90, S=180, W=270","Wind speed in m/s","Total sky cover","Opaque sky cover","Visibility in km","Ceiling height in m","Present weather observation","Present weather codes","Precipitable water in mm","Aerosol optical depth","Snow depth in cm","Days since last snowfall","Albedo","Liquid precipitation depth in mm","Liquid precipitation quantity"]
    Data = pd.read_csv(filename, skiprows=8,header=None)
    del Data[5]
    Data.columns = headers
    Data.index = pd.to_datetime(Data[["year","month","day","hour"]])

    Data = Data.tz_localize(int(meta['TZ']*3600))

    return Data, meta

cdeline commented 6 years ago

@squoilin - Silvain, this EPW reader function is quite useful. I'm incorporating it into my development of bifacialvf and bifacial_radiance at github.com/nrel/bifacialvf and github.com/nrel/bifacial_radiance. If/ when this gets pulled into the pvlib distribution, I'll switch over to the 'official' version. Thanks!

pvlib / pvlib-python

Design pvlib.io #261

RE: TMY

RE: reorganization

RE: dataio

RE: `dataio`