pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.27k stars 17.8k forks source link

DataFrame groupby partially drops timezone info (to_csv, in notebook) #7622

Closed Poquaruse closed 10 years ago

Poquaruse commented 10 years ago

Hi all,

I've encountered a problem with DataFrames, groupby and timezones.

import pandas as pd import numpy as np

dt_rng = pd.date_range(start='2014-01-01 00:00', periods = 1000, freq='1s', tz='Europe/Berlin') df = pd.DataFrame({'a':np.random.randn(1000), 'b': np.random.randn(1000)},index = dt_rng) df['b'] = df['b'].round() df.to_csv()

--> Timezones are shown in the csv output, for example 2014-01-01 00:00:00+01:00

Now with resampling:

dt_rng = pd.date_range(start='2014-01-01 00:00', periods = 1000, freq='1s', tz='Europe/Berlin') df = pd.DataFrame({'a':np.random.randn(1000), 'b': np.random.randn(1000)},index = dt_rng) df['b'] = df['b'].round() df.groupby(df['b']).resample('1min').to_csv()

--> 2013-12-31 23:01:00 no timezone info, not even UTC.

However:

dt_rng = pd.date_range(start='2014-01-01 00:00', periods = 1000, freq='1s', tz='Europe/Berlin') df = pd.DataFrame({'a':np.random.randn(1000), 'b': np.random.randn(1000)},index = dt_rng) df['b'] = df['b'].round() df.groupby(df['b']).resample('1min').index.levels[1]

shows: Timezone: Europe/Berlin

So the info seems to be there, but is not exported - even if it was exported before without resampling...

Any ideas?

Thanks and best regards

jreback commented 10 years ago

pls post pd.show_versions()

Poquaruse commented 10 years ago

Sorry, I forgot. Here it is:

INSTALLED VERSIONS

commit: None python: 3.4.1.final.0 python-bits: 64 OS: Windows OS-release: 8 machine: AMD64 processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel byteorder: little LC_ALL: None LANG: None

pandas: 0.14.0 nose: 1.3.3 Cython: 0.20.1 numpy: 1.8.1 scipy: 0.14.0 statsmodels: None IPython: 2.1.0 sphinx: 1.2.2 patsy: 0.2.1 scikits.timeseries: None dateutil: 2.1 pytz: 2014.3 bottleneck: None tables: 3.1.1 numexpr: 2.3.1 matplotlib: 1.3.1 openpyxl: 1.8.5 xlrd: 0.9.3 xlwt: None xlsxwriter: 0.5.5 lxml: 3.3.5 bs4: 4.3.1 html5lib: None bq: None apiclient: None rpy2: None sqlalchemy: 0.9.4 pymysql: None psycopg2: None

jreback commented 10 years ago

this works in master, lots of bugs related to tz preservation are fixed for 0.14.1 (releasing soon)

Poquaruse commented 10 years ago

Thanks for the heads up! :-)