pvlib / pvlib-python

A set of documented functions for simulating the performance of photovoltaic energy systems.
https://pvlib-python.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.19k stars 1k forks source link

Time zones / Time references #47

Closed bmu closed 8 years ago

bmu commented 9 years ago

I think we should make sure, that time zones, other time references are used correctly. We can have time stamps given as UTC, really related to a time zone, local time (without DST), mean solar time, true solar time, ...? This is an everyday problem from my experience, so we should be able to handle all of these formats at least in the future. For now it could be ok to define, that all times should be given as UTC and leave all time conversion issue to the user.

As pandas has support for Julian dates since 0.14 (this is the pull request). I was thinking about creating a pull request to implement to_solar_time in pandas. However this is not easy to implement and I am not sure, if they accept it. But this way all time conversion would be in pandas an we could implement thinks based on pandas.

wholmgren commented 9 years ago

I'm all for being helpful with timezones. This was my motivation for adding a tz attribute to the Location class. The solarposition module (with some functions in tools) has an attempt at using tz knowledge from either the DatetimeIndex or the Location object. Do these fit into your scope or do you want to do something different? I did think that it would be nice if the Location constructor could use a web service to determine a IANA timezone from a lat/lon. (Or maybe even a lat/lon/tz from a city/state/country search, but that's another topic.)

Guaranteeing that time zones are handled correctly is hard unless you're very restrictive about it. My inclination is to leave the vast majority of the work to pandas. All localized times in pandas are stored as utc internally, so we can and should take advantage of that (as the description above does).

We could require that all times be localized, or we could throw warnings whenever we detect that people are using non-localized times.

Concerning solar time... I just rely on the zenith and azimuth calculations for a given lat/lon. Is there an application for solar time other than for the actual calculation of those quantities of interest? I don't know how pandas would reasonably incorporate something that requires a specific lat/lon. Maybe we're talking about two different quantities? I think of solar time as the local time shifted to make the minimum zenith occur at 12:00 noon.

bmu commented 9 years ago

I mean true or apparent solar time, so I think we talk about the same quantities. It is often useful to plot some quantities against this time e.g. to detect the azimuth angle of a system or to detect misalignments between irradiance sensor and modules, ... So I think it would be useful.

I would also like to rely on pandas for time conversion, as it is the best python implementation I am aware of. However I may be difficult to detect the exact format that a user is assigning to a function. So perhaps we should extend the tz keyword (or whatever it's name should be) to except not only real time zone names, but also something like utc+8 for a time format without daylight saving time, tst for true solar time, mst for mean solar time ...

As a default I think utc would be better than a localized time.

jforbess commented 9 years ago

Agreed on using pandas and default UTC, and I think one critical component of having a successful time implementation for others will be a tutorial covering various conversions and manipulations.

I am still trying to wrap my head around how true solar time and mean solar time interact with data collected from a system, but I will get there.

On Sun, Apr 5, 2015 at 11:51 PM, bmu notifications@github.com wrote:

I mean true or apparent solar time http://en.wikipedia.org/wiki/Solar_time, so I think we talk about the same quantities. It is often useful to plot some quantities against this time e.g. to detect the azimuth angle of a system or to detect misalignments between irradiance sensor and modules, ... So I think it would be useful.

I would also like to rely on pandas for time conversion, as it is the best python implementation I am aware of. However I may be difficult to detect the exact format that a user is assigning to a function. So perhaps we should extend the tz (or whatever it's name should be) to ecept not only real time zone names, but also something like utc+8 for a time format without daylight saving time, tst for true solar time, mst for mean solar time ...

As a default I think utc would be better than a localized time.

— Reply to this email directly or view it on GitHub https://github.com/pvlib/pvlib-python/issues/47#issuecomment-89947982.

wholmgren commented 9 years ago

That sounds like a good use case. solarposition.ephemeris does return solar time, although I did not test this output in my recent PR. Pyephem has a sidereal_time method, so we can add solar time to solarposition.pyephem also. Hopefully we can extract this from the other SPA functions as well.

However I may be difficult to detect the exact format that a user is assigning to a function.

I'm not sure what you mean by this. My thought is that you can always do one of these:

  1. Use the timezone specified on a DatetimeIndex or a Location. (Possibly raise an error if they do not agree.)
  2. Assume UTC. (Possibly raise a warning that we're assuming UTC.)

I don't think there can be any ambiguity because the DatetimeIndex and Location constructors raise errors if the user inputs an timezone that pytz doesn't know about. pytz.FixedOffset timezones can do something like your utc+8 suggestion:

pd.DatetimeIndex(['2015-1-1T00']).tz_localize(pytz.FixedOffset(120))

<class 'pandas.tseries.index.DatetimeIndex'>
[2015-01-01 00:00:00+02:00]
Length: 1, Freq: None, Timezone: pytz.FixedOffset(120)

We could consider making a rule that all timezones must always be specified (even if it's UTC).

tst for true solar time, mst for mean solar time

Interesting idea; a correct and consistent implementation sounds hard. (mst is Mountain Standard Time in the US.)

In any case, I will improve the Location docstring and make a tutorial. We could also make a page on the rtd docs just for timezones. I will also change the Location default tz to be UTC.

bmu commented 9 years ago

I think, we misunderstand each other a little bit until now; my English may be the reason ;-)

My use cases include up to now:

So I need

  1. to convert any of these to any defined standard, using tools provided by pvlib or
  2. a keyword that lets our functions know, what kind of time stamp is used and the conversion to this standard is performed within the function (should be utc in any case I think)

Hope this is understandable ;-)

@jforbess Data collected from a system is usually stored with a local time stamp, from my experience. However, sometimes I do not now the azimuth of the system or argue that something is wrong with the alignment of the irradiance sensor. If you plot your measurements, lets say power and irradiance against local time you won't see a clear peak at noon (if the system is oriented towards equator), but have a scatter due to the equation of time (about +- 15 minutes). If you plot it against true solar time you have a clear peak for both quantities at noon. If not, there is something from either with the alignment of the irradiance sensor or the azimuth of the system is not oriented towards equator.
This is only one use case, there are other examples were true solar time is useful.

jforbess commented 9 years ago

Yes, those cases are generally what I would assume, though in my experience irradiance data from a national weather service is also in local time, not true solar time. I am not sure I ever see anything in true solar time except outputs from very specialized models, like the example you cited, and in that example, I might compare daily peaks to daily model peaks in local time, which would look for alignment at 12:35 or whenever the peak is expected in local time.

From my past experience, using UTC internally usually ends up resulting in the fewest conversions, since there will always be some. :)

On Mon, Apr 6, 2015 at 10:45 AM, bmu notifications@github.com wrote:

I think, we misunderstand each other a little bit until now; my English may be the reason ;-)

My use cases include up to now:

  • irradiance data provided by a national weather service using true solar time as timestamp
  • data from our own monitoring system using local time (without daylight saving)
  • data from utilities most often localized (including daylight saving)
  • data from meteonorm, local time (I think)
  • data from GeoModel, UTC
  • "SatelLight" data, local time

So I need

  1. to convert any of these to any defined standard, using tools provided by pvlib or
  2. a keyword that lets our functions know, what kind of time stamp is used and the conversion to this standard is performed within the function (should be utc in any case I think)

Hope this is understandable ;-)

@jforbess https://github.com/jforbess Data collected from a system is usually stored with a local time stamp, from my experience. However, sometimes I do not now the azimuth of the system or argue that something is wrong with the alignment of the irradiance sensor. If you plot your measurements, lets say power and irradiance against local time you won't see a clear peak at noon (if the system is oriented towards equator), but have a scatter due to the equation of time (about +- 15 minutes). If you plot it against true solar time you have a clear peak for both quantities at noon. If not, there is something from either with the alignment of the irradiance sensor or the azimuth of the system is not oriented towards equator.

This is only one use case, there are other examples were true solar time is useful.

— Reply to this email directly or view it on GitHub https://github.com/pvlib/pvlib-python/issues/47#issuecomment-90169522.

wholmgren commented 9 years ago

These use cases are helpful and cover a pretty broad range. I think that properly used IANA timezones (the pytz/pandas convention) can cover all except for solar time. For example, here in the US we have US/Mountain (DST aware) and MST (no-DST):

pd.DatetimeIndex(['2015-4-1T00']).tz_localize('US/Mountain')

<class 'pandas.tseries.index.DatetimeIndex'>
[2015-04-01 00:00:00-06:00]
Length: 1, Freq: None, Timezone: US/Mountain

pd.DatetimeIndex(['2015-4-1T00']).tz_localize('MST')

<class 'pandas.tseries.index.DatetimeIndex'>
[2015-04-01 00:00:00-07:00]
Length: 1, Freq: None, Timezone: MST

Note the difference in the UTC offset between the two.

So long as I specify times consistent with the timezone, pandas will use the same integer to represent them:

# US/Mountain
pd.DatetimeIndex(['2015-4-1T01']).tz_localize('US/Mountain').astype(int)

array([1427871600000000000])

# MST
pd.DatetimeIndex(['2015-4-1T00']).tz_localize('MST').astype(int)

array([1427871600000000000])

# UTC
pd.DatetimeIndex(['2015-4-1T07']).tz_localize('UTC').astype(int)

array([1427871600000000000])

# No specification
pd.DatetimeIndex(['2015-4-1T07']).astype(int)

array([1427871600000000000])

So I think we already have all of the tools we need, and the Location objects and the solarposition module already make use of these tools.

On solar time... maybe I see part of the remaining problem now. We've been talking about a way to go from a UTC-based timestamp to solar time (e.g. solarposition.ephemeris), but you want to be able to do the reverse. That seems harder. Could be doable with interpolation. Maybe @alorenzo175 knows how to do it exactly.

bmu commented 9 years ago

Ok, I was aware of the pandas time conversion / localization, but not the non-DST ones. I agree, that in this case most of the tools are available. For the conversion from true solar time to UTC one needs the time difference of mean solar time to utc (depending on longitude) and the equation of time, should also be included in the solar position algorithms.

What about the handling of tz-naive timestamps? Do I have to localize before up to now?

bmu commented 9 years ago

@jforbess: just ask the German weather service ;-)

the other comment: you won't have a peak at 12:35 if you plot one year data against local time. the peak shifts from 12:20 to 12:50 due to equation of time. and if you want to detect small sensor misalignments, this is a problem.

bmu commented 9 years ago

There are other use cases (see e.g. this paper, section 5.1.).

However, I'm not sure if it should be the first priority, as you seem to see no need for true solar time, and maybe most of the users also do not need it.

wholmgren commented 9 years ago

The pandas integer representation of a tz-naive timestamp is the same as that for a UTC timestamp (see above). Is that what you meant?

You've convinced me that including solar time in the solarposition outputs is high priority, but the converter from solar time to UTC would be lower on my priority list.

jforbess commented 9 years ago

Yes, I'm not saying that true solar time isn't a key concept, I am just saying that it's a lower priority to be able to convert between it and UTC.

Though I suspect that the large German operations performance community would disagree, given @bmu's insight regarding their dataset.

On Mon, Apr 6, 2015 at 12:03 PM, Will Holmgren notifications@github.com wrote:

The pandas integer representation of a tz-naive timestamp is the same as that for a UTC timestamp (see above). Is that what you meant?

You've convinced me that including solar time in the solarposition outputs is high priority, but the converter from solar time to UTC would be lower on my priority list.

— Reply to this email directly or view it on GitHub https://github.com/pvlib/pvlib-python/issues/47#issuecomment-90199601.

bmu commented 9 years ago

Ok, I'm fine with a lower priority for the reverse calculation (not sure about "the large German operations performance community", though)

bmu commented 9 years ago

@wholmgren: I had a look on the code (especially tools.localize_to_utc) and now it is clearer what happens. Should have done this before, sorry.

But I think something in the documentation on this topic would be useful, as not all users will be aware of the time zone handling.

jforbess commented 9 years ago

Just a reminder to myself and others as to the huge install base outside my experience, and you know, the largest installed base in the world.

On Mon, Apr 6, 2015 at 12:18 PM, bmu notifications@github.com wrote:

Ok, I'm fine with a lower priority for the reverse calculation (not sure about "the large German operations performance community", though)

— Reply to this email directly or view it on GitHub https://github.com/pvlib/pvlib-python/issues/47#issuecomment-90204213.

bmu commented 9 years ago

After some sleep, I see my initial confusion: From my point of view, the time zone is related to the data, not to the location. Maybe it is good to have information on the (legal) time zone of a location, but this is not necessarily an indicator for the time stamp used in e.g. measurement files (and I don't know, if this is clear for a user in general). That's why I was asking for a time zone keyword e.g. for the solar position functions. Maybe this keyword could default to None or infer or ... (in this case it must be a tz-aware time index). If the time index is tz-naive, the user can provide the time reference. Maybe we could also use the time zone of the location class, but than it should be clearer that this should be the time zone, that is used for the time stamps.

wholmgren commented 9 years ago

My thought is to go in the other direction. Make the solar position methods only accept tz localized times, raise an error with naive times, and ignore the location object's tz attribute. This makes it so that there is only one way to do it and the user must be very explicit about it. If you have data, then it was recorded with a specific time convention (UTC, Europe/Berlin, US/Arizona) and I think the very first thing you should do when you import that data is to use tz_localize. As soon as you tz_localize, pandas starts treating it as UTC under the hood, you can convert it to whatever viewing representation you want, and there can be no mistakes.

bmu commented 9 years ago

I think this could be right, but I didn't used to do it this way. Time zone conversion was complicated before pandas, should be possible today for a user.

jforbess commented 9 years ago

I just discovered that pandas doesn't seem to behave well in one situation: I have data from PVsyst (no DST) that I am trying to load into pvlib to compare to a pvlib model. I am trying to align the timestamps, but pandas doesn't seem capable of localizing timestamps without applying DST. (I'd like to just specify UTC-5, instead of 'US/Eastern'.) I asked a question about this on stackexchange because it seems to be more of a pandas issue than pvlib issue, but perhaps people here have solved this problem already? If not, it's something that we can't rely on pandas for, though perhaps support is coming in 0.17.

wholmgren commented 9 years ago

I think you want to localize with 'EST' or pytz.FixedOffset(-5*60).

I suppose I should finish writing the pvlib timezone tutorial.

jforbess commented 9 years ago

I checked 'EST', and it still applied the UTC-4 to the periods between March and November, but the pytz.FixedOffset worked great. I had something in mind like that, but couldn't quite figure out how pytz and tz_localize interacted. Thanks!

wholmgren commented 9 years ago

@bmu @jforbess I reread this thread and I'm not sure what, if anything, is needed to close this issue. Can we resolve it with better documentation or do we need to change the code?

jforbess commented 9 years ago

I think there were two potential actions:

1) Require timestamped data to be localized. I don't know that we agreed whether this was a good requirement or not. I view it as moderately positive, because it will force all handling of timezones to be clearer. But a potential drawback is when I somehow find myself using a Central timezone for data in an Eastern location because of how the data was captured originally. (Probably because it wasn't clear whether the data was captured as interval beginning or interval ending, to use PVsyst's nomenclature.)

2) Provide a mapping to true solar time. I believe this is a desirable feature, but not likely to be part of the 0.2 release.

Finally, I do think that better documentation can cover item 1 if others don't prefer requiring localization.

On Sun, Jun 21, 2015 at 12:36 PM, Will Holmgren notifications@github.com wrote:

@bmu https://github.com/bmu @jforbess https://github.com/jforbess I reread this thread and I'm not sure what, if anything, is needed to close this issue. Can we resolve it with better documentation or do we need to change the code?

— Reply to this email directly or view it on GitHub https://github.com/pvlib/pvlib-python/issues/47#issuecomment-113933939.

bmu commented 9 years ago

I agree with @jforbess in both points. But I think we can shift both of them to a later release.

wholmgren commented 9 years ago

Ok, thanks for the clarification. I am going to mark this issue as 0.3 and think about it again after this release is out.

wholmgren commented 8 years ago

6 months later... the solarposition.py module in PR #93 now requires that timestamped data be localized or it will be assumed to be UTC time. See the diff.

wholmgren commented 8 years ago

I think that the new code and the new documentation can close this issue. Here's the new proposed documentation:

http://wholmgren-pvlib-python-new.readthedocs.org/en/contributing/timetimezones.html

Please provide comments in #135, if you have them.

Assuming no objections, I will close this when #135 is merged. A new issue could be created for adding true solar time to more functions, if desired.

bmu commented 8 years ago

Just an idea: We could open a milestone for the Santa Clara pvlib sprint in May and move such issues to this milestone. I will be there, too.

Am 15. März 2016 20:55:40 MEZ, schrieb Will Holmgren notifications@github.com:

I think that the new code and the new documentation can close this issue. Here's the new proposed documentation:

http://wholmgren-pvlib-python-new.readthedocs.org/en/contributing/timetimezones.html

Please provide comments in #135, if you have them.

Assuming no objections, I will close this when #135 is merged. A new issue could be created for adding true solar time to more functions, if desired.


You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/pvlib/pvlib-python/issues/47#issuecomment-196996019

Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.

dacoex commented 8 years ago

We could open a milestone for the Santa Clara pvlib sprint in May and move such issues to this >milestone. I will be there, too.

Regarding Santa Clara pvlib sprint: I received an invitation. Thanks for that! But will not be able to attend. If there is a possibility to participate on discussions and sprints remotely I would be happy to do so.

wholmgren commented 8 years ago

@bmu a Santa Clara milestone is good idea and it's great to hear that you'll be able to attend. Did you rsvp to Josh Stein? I did not see your name on a recent list of attendees.

@dacoex and others: We don't yet know exactly what the agenda will be, but I expect that there will be a way to contribute remotely. We'll still be using GitHub, of course.

I'll make a milestone and a new issue for further discussion.