openeemeter / eemeter

An open source python package for implementing and developing standard methods for calculating normalized metered energy consumption and avoided energy use.
http://eemeter.openee.io/
Apache License 2.0
211 stars 67 forks source link

Change datetime type for samples read_meter_data_from_csv #371

Closed limejoeg closed 4 years ago

limejoeg commented 4 years ago

When loading the default sample data, .tz_localize('UTC') requires a different datetime type than is being passed. The parser is unable to standardize dates and is unable to convert 'start' to a datetime index.

I fixed this bug by explicitly converting the datetime column.

SOLUTION: def meter_data_from_csv( filepath_or_buffer, tz=None, start_col="start", value_col="value", gzipped=False, freq=None, **kwargs ): """ Load meter data from a CSV file. Default format:: start,value 2017-01-01T00:00:00+00:00,0.31 2017-01-02T00:00:00+00:00,0.4 2017-01-03T00:00:00+00:00,0.58 Parameters

filepath_or_buffer : :any:`str` or file-handle
    File path or object.
tz : :any:`str`, optional
    E.g., ``'UTC'`` or ``'US/Pacific'``
start_col : :any:`str`, optional, default ``'start'``
    Date period start column.
value_col : :any:`str`, optional, default ``'value'``
    Value column, can be in any unit.
gzipped : :any:`bool`, optional
    Whether file is gzipped.
freq : :any:`str`, optional
    If given, apply frequency to data using :any:`pandas.DataFrame.resample`.
**kwargs
    Extra keyword arguments to pass to :any:`pandas.read_csv`, such as
    ``sep='|'``.
"""

read_csv_kwargs = {
    "usecols": [start_col, value_col],
    "dtype": {value_col: np.float64},
    "parse_dates": [start_col],
    "index_col": start_col,
}

if gzipped:
    read_csv_kwargs.update({"compression": "gzip"})

# allow passing extra kwargs
read_csv_kwargs.update(kwargs)

df = pd.read_csv(filepath_or_buffer, **read_csv_kwargs)
**df.index = pd.to_datetime(df.index, utc=True)**
if tz is not None:
    df = df.tz_convert(tz)

if freq == "hourly":
    df = df.resample("H").sum()
elif freq == "daily":
    df = df.resample("D").sum()

return df

ERROR:

TypeError Traceback (most recent call last)

in 1 #Daily Billing for Caltrack 2 meter_data, temperature_data, sample_metadata = ( ----> 3 eemeter.load_sample("il-electricity-cdd-hdd-daily") 4 ) 5 ~/anaconda3/envs/eenv/lib/python3.7/site-packages/eemeter/samples/load.py in load_sample(sample) 80 meter_data_filename = metadata["meter_data_filename"] 81 with resource_stream("eemeter.samples", meter_data_filename) as f: ---> 82 meter_data = meter_data_from_csv(f, gzipped=True, freq=freq) 83 84 temperature_filename = metadata["temperature_filename"] ~/anaconda3/envs/eenv/lib/python3.7/site-packages/eemeter/io.py in meter_data_from_csv(filepath_or_buffer, tz, start_col, value_col, gzipped, freq, **kwargs) 81 read_csv_kwargs.update(kwargs) 82 ---> 83 df = pd.read_csv(filepath_or_buffer, **read_csv_kwargs).tz_localize("UTC") 84 if tz is not None: 85 df = df.tz_convert(tz) ~/anaconda3/envs/eenv/lib/python3.7/site-packages/pandas/core/generic.py in tz_localize(self, tz, axis, level, copy, ambiguous, nonexistent) 9865 if level not in (None, 0, ax.name): 9866 raise ValueError("The level {0} is not valid".format(level)) -> 9867 ax = _tz_localize(ax, tz, ambiguous, nonexistent) 9868 9869 result = self._constructor(self._data, copy=copy) ~/anaconda3/envs/eenv/lib/python3.7/site-packages/pandas/core/generic.py in _tz_localize(ax, tz, ambiguous, nonexistent) 9848 ax_name = self._get_axis_name(axis) 9849 raise TypeError( -> 9850 "%s is not a valid DatetimeIndex or " "PeriodIndex" % ax_name 9851 ) 9852 else: TypeError: index is not a valid DatetimeIndex or PeriodIndex
philngo commented 4 years ago

@limejoeg Thank you for your bug fix. A couple of things that would help me include this in the next version:

  1. Would you consider making a fork, putting your changes into a branch, and creating a pull request (how to)? This will help us run automated tests on your branch so that we can see if your change is backwards compatible.

  2. Would you share a sample of data (or at least the datetime format) you are reading in so that case can be added to the unit tests?

limejoeg commented 4 years ago

Hey Phil,

If I have time I will try to do that. It’s really just a one line fix that is happening bc of Pandas updates (I think). I’m getting the error when I go through the tutorial. The error happens for any of the sample files. I think the index has to be explicitly converted to datetime before tz_convert can be used. Calling:

meter_data, temperature_data, sample_metadata = ( eemeter.load_sample("il-electricity-cdd-hdd-daily") )

was what threw the error.

And here is the fix if you’d like to quickly put it in.

Line 83 of io.py

df = pd.read_csv(filepath_or_buffer, **read_csv_kwargs).tz_localize("UTC")

add in

df.index = pd.to_datetime(df.index, utc=True)

Hope this helps. At some point I anticipate contributing more, but I will not have time to do anything in the near future. Please let me know if you have any questions, or think that my assumptions are off.

Joe Glass Data Science Lime Energy 510.919.8200

From: Phil Ngo notifications@github.com Reply-To: openeemeter/eemeter reply@reply.github.com Date: Tuesday, October 1, 2019 at 8:47 AM To: openeemeter/eemeter eemeter@noreply.github.com Cc: Joseph Glass-Katz Joe.Glass@Lime-Energy.com, Mention mention@noreply.github.com Subject: Re: [openeemeter/eemeter] Change datetime type for samples read_meter_data_from_csv (#371)

@limejoeghttps://github.com/limejoeg Thank you for your bug fix. A couple of things that would help me include this in the next version:

  1. Would you consider making a fork, putting your changes into a branch, and creating a pull requesthttps://github.com/openeemeter/eemeter/compare (how tohttps://help.github.com/en/articles/creating-a-pull-request)? This will help us run automated tests on your branch so that we can see if your change is backwards compatible.
  2. Would you share a sample of data (or at least the datetime format) you are reading in so that case can be added to the unit testshttps://github.com/openeemeter/eemeter/blob/cac2d50566ab32cbbbe697bbe905d0940ec7baa9/tests/test_io.py#L91?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/openeemeter/eemeter/issues/371?email_source=notifications&email_token=ANDTICRGVR3DEYSP3KGEJN3QMNWFXA5CNFSM4I4C3R52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEABXZ5Y#issuecomment-537099511, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ANDTICT3ZJ6K2F7IG262DH3QMNWFXANCNFSM4I4C3R5Q.

limejoeg commented 4 years ago

This bug is for the temperature_data_from_csv as well.

From: Joseph Glass-Katz Joe.Glass@Lime-Energy.com Date: Tuesday, October 1, 2019 at 11:26 AM To: openeemeter/eemeter reply@reply.github.com Subject: Re: [openeemeter/eemeter] Change datetime type for samples read_meter_data_from_csv (#371)

Hey Phil,

If I have time I will try to do that. It’s really just a one line fix that is happening bc of Pandas updates (I think). I’m getting the error when I go through the tutorial. The error happens for any of the sample files. I think the index has to be explicitly converted to datetime before tz_convert can be used. Calling:

meter_data, temperature_data, sample_metadata = ( eemeter.load_sample("il-electricity-cdd-hdd-daily") )

was what threw the error.

And here is the fix if you’d like to quickly put it in.

Line 83 of io.py

df = pd.read_csv(filepath_or_buffer, **read_csv_kwargs).tz_localize("UTC")

add in

df.index = pd.to_datetime(df.index, utc=True)

Hope this helps. At some point I anticipate contributing more, but I will not have time to do anything in the near future. Please let me know if you have any questions, or think that my assumptions are off.

Joe Glass Data Science Lime Energy 510.919.8200

From: Phil Ngo notifications@github.com Reply-To: openeemeter/eemeter reply@reply.github.com Date: Tuesday, October 1, 2019 at 8:47 AM To: openeemeter/eemeter eemeter@noreply.github.com Cc: Joseph Glass-Katz Joe.Glass@Lime-Energy.com, Mention mention@noreply.github.com Subject: Re: [openeemeter/eemeter] Change datetime type for samples read_meter_data_from_csv (#371)

@limejoeghttps://github.com/limejoeg Thank you for your bug fix. A couple of things that would help me include this in the next version:

  1. Would you consider making a fork, putting your changes into a branch, and creating a pull requesthttps://github.com/openeemeter/eemeter/compare (how tohttps://help.github.com/en/articles/creating-a-pull-request)? This will help us run automated tests on your branch so that we can see if your change is backwards compatible.
  2. Would you share a sample of data (or at least the datetime format) you are reading in so that case can be added to the unit testshttps://github.com/openeemeter/eemeter/blob/cac2d50566ab32cbbbe697bbe905d0940ec7baa9/tests/test_io.py#L91?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/openeemeter/eemeter/issues/371?email_source=notifications&email_token=ANDTICRGVR3DEYSP3KGEJN3QMNWFXA5CNFSM4I4C3R52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEABXZ5Y#issuecomment-537099511, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ANDTICT3ZJ6K2F7IG262DH3QMNWFXANCNFSM4I4C3R5Q.

philngo commented 4 years ago

@limejoeg Ah, I see. Yeah, this will help us support the newest pandas versions (0.24.x), we've been stuck on 0.23.x for a while. If I get to it first I'll make a PR and tag you for review.

sichen1234 commented 4 years ago

FYI we were just trying eemeter today and got a similar bug using 2.7.5 from pip in a python3 venv with pandas 0.25.1 or 0.24.2:

Python 3.7.2 (default, Feb 20 2019, 14:37:21)
[Clang 10.0.0 (clang-1000.11.45.5)] on darwin
>>> import eemeter
>>> eemeter.samples()
['il-electricity-cdd-hdd-billing_bimonthly', 'il-electricity-cdd-hdd-billing_monthly', 'il-electricity-cdd-hdd-daily', 'il-electricity-cdd-hdd-hourly', 'il-electricity-cdd-only-billing_bimonthly', 'il-electricity-cdd-only-billing_monthly', 'il-electricity-cdd-only-daily', 'il-electricity-cdd-only-hourly', 'il-gas-hdd-only-billing_bimonthly', 'il-gas-hdd-only-billing_monthly', 'il-gas-hdd-only-daily', 'il-gas-hdd-only-hourly', 'il-gas-intercept-only-billing_bimonthly', 'il-gas-intercept-only-billing_monthly', 'il-gas-intercept-only-daily', 'il-gas-intercept-only-hourly']
>>> meter_data, temperature_data, metadata = eemeter.load_sample('il-electricity-cdd-hdd-daily')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "venv/lib/python3.7/site-packages/eemeter/samples/load.py", line 82, in load_sample
    meter_data = meter_data_from_csv(f, gzipped=True, freq=freq)
  File "venv/lib/python3.7/site-packages/eemeter/io.py", line 83, in meter_data_from_csv
    df = pd.read_csv(filepath_or_buffer, **read_csv_kwargs).tz_localize("UTC")
  File "venv/lib/python3.7/site-packages/pandas/core/generic.py", line 9867, in tz_localize
    ax = _tz_localize(ax, tz, ambiguous, nonexistent)
  File "venv/lib/python3.7/site-packages/pandas/core/generic.py", line 9850, in _tz_localize
    "%s is not a valid DatetimeIndex or " "PeriodIndex" % ax_name
TypeError: index is not a valid DatetimeIndex or PeriodIndex

It only works when downgrading to pandas 0.23.4.

Replacing in io.py line 83 and line 149, the two instances of

    df = pd.read_csv(filepath_or_buffer, **read_csv_kwargs).tz_localize(tz)

with

    df = pd.read_csv(filepath_or_buffer, **read_csv_kwargs)
    df.index = pd.to_datetime(df.index, utc=True)

Did fix the exception.

philngo commented 4 years ago

@limejoeg @opentaps I went ahead and implemented the fix in #372. I needed to make a few modifications to retain compatibility with pandas<0.24.0, but it looks like we should be good now with both the recent and older pandas versions. I've released the fix in eemeter=2.7.6.

I hope that both of you will consider helping our developer community by filling out our first-time issue/PR contributor survey.

limejoeg commented 4 years ago

Thank you, and thanks for maintaining this project!

From: Phil Ngo notifications@github.com Reply-To: openeemeter/eemeter reply@reply.github.com Date: Wednesday, October 2, 2019 at 1:00 PM To: openeemeter/eemeter eemeter@noreply.github.com Cc: Joseph Glass-Katz Joe.Glass@Lime-Energy.com, Mention mention@noreply.github.com Subject: Re: [openeemeter/eemeter] Change datetime type for samples read_meter_data_from_csv (#371)

@limejoeghttps://github.com/limejoeg @opentapshttps://github.com/opentaps I went ahead and implemented the fix in #372https://github.com/openeemeter/eemeter/pull/372. I needed to make a few modifications to retain compatibility with pandas<0.24.0, but it looks like we should be good now with both the recent and older pandas versions. I've released the fix in eemeter=2.7.6.

I hope that both of you will consider helping our developer community by filling out our first-time issue/PR contributor surveyhttps://forms.gle/egrBUVm81g7GVTk28.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/openeemeter/eemeter/issues/371?email_source=notifications&email_token=ANDTICQ3L5MFOCEESPEBBM3QMT4TVA5CNFSM4I4C3R52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAF7QEY#issuecomment-537655315, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ANDTICW3V7X3DBBJTLVOZF3QMT4TVANCNFSM4I4C3R5Q.

sichen1234 commented 4 years ago

Yes, thanks for fixing it!

Si Chen Open Source Strategies, Inc.

opentaps and open source business models at the VOLTTRON conference: https://youtu.be/2jnyIOBHrkU http://bit.ly/2lUGmKm

https://youtu.be/3bBNcC3jnlc

On Wed, Oct 2, 2019 at 12:59 PM Phil Ngo notifications@github.com wrote:

@limejoeg https://github.com/limejoeg @opentaps https://github.com/opentaps I went ahead and implemented the fix in #372 https://github.com/openeemeter/eemeter/pull/372. I needed to make a few modifications to retain compatibility with pandas<0.24.0, but it looks like we should be good now with both the recent and older pandas versions. I've released the fix in eemeter=2.7.6.

I hope that both of you will consider helping our developer community by filling out our first-time issue/PR contributor survey https://forms.gle/egrBUVm81g7GVTk28.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openeemeter/eemeter/issues/371?email_source=notifications&email_token=AANAS4JAYQ6Z2MVAN4XMC2LQMT4TVA5CNFSM4I4C3R52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAF7QEY#issuecomment-537655315, or mute the thread https://github.com/notifications/unsubscribe-auth/AANAS4IZALMNXZSZSRH6NJLQMT4TVANCNFSM4I4C3R5Q .