pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.66k stars 17.91k forks source link

BUG: `tz_localize` drops `freq` from `DatetimIndex` #36575

Open giuliobeseghi opened 4 years ago

giuliobeseghi commented 4 years ago

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pandas as pd

s = pd.Series(
    [1, 2, 3, 4, 5], index=pd.date_range("2020", periods=5, freq="D")
)
print(s.index.freq)  # <Day>
print(s.tz_localize("europe/london").index.freq)  # None

Problem description

The index should retain the freq attribute despite localization.

Output of pd.show_versions()

It raises ```python-traceback ImportError: Can't determine version for hypothesis ```
giuliobeseghi commented 4 years ago

Originally posted in https://github.com/pandas-dev/pandas/issues/33677

giuliobeseghi commented 4 years ago

It actually works if tz is "utc"

print(s.tz_localize("europe/London").index.freq)  # None
  • [x] I have checked that this issue has not already been reported.
  • [ ] I have confirmed this bug exists on the latest version of pandas.
  • [ ] (optional) I have confirmed this bug exists on the master branch of pandas.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pandas as pd

s = pd.Series(
    [1, 2, 3, 4, 5], index=pd.date_range("2020", periods=5, freq="D")
)
print(s.index.freq)  # <Day>
print(s.tz_localize("europe/london").index.freq)  # None

Problem description

The index should retain the freq attribute despite localization.

Output of pd.show_versions()

attack68 commented 4 years ago

Extended the example to show the impact is solely to Index.

s = pd.Series(
    index=pd.date_range("2020", periods=5, freq="D"),
    data=pd.date_range("2020", periods=5, freq="D"),
    name="date"
)
print(s.index.freq)  
print(s.tz_localize("europe/london").index.freq)  
print(s.dt.tz_localize("europe/london").dt.freq)

On version 1.0.4:

<Day> <Day> D

On version 1.1.2:

<Day> None D

attack68 commented 4 years ago

Also link to here: https://github.com/pandas-dev/pandas/issues/33940#issuecomment-685774768

giuliobeseghi commented 4 years ago

Extended the example to show the impact is solely to Index.

s = pd.Series(
    index=pd.date_range("2020", periods=5, freq="D"),
    data=pd.date_range("2020", periods=5, freq="D"),
    name="date"
)
print(s.index.freq)  
print(s.tz_localize("europe/london").index.freq)  
print(s.dt.tz_localize("europe/london").dt.freq)

On version 1.0.4:

<Day> <Day> D

On version 1.1.2:

<Day> None D

Well, that's because the dt accessor always return a frequency when it's inferrable:

import pandas as pd

date_range = pd.date_range("2020", periods=5, freq="D")
date_range.freq = None  # remove frequency from index

s = pd.Series(data=date_range, index=None)
print(s.dt.freq)  # 'D' - dt infers frequency
veenstrajelmer commented 1 month ago

Freq is still being dropped by tz_localize. It is actually maintained if you use tz_localize via dt:

import pandas as pd
s = pd.Series(
    index=pd.date_range("2020", periods=5, freq="D"),
    data=pd.date_range("2020", periods=5, freq="D"),
    name="date"
)
print(s.index.freq)  
print(s.tz_localize("europe/london").index.freq)  
print(s.dt.tz_localize("europe/london").index.freq)

Prints:

<Day>
None
<Day>

However, if you directly work with the index this dt is not available, but you can overwrite the freq from the inferred freq property:

import pandas as pd
s = pd.Series(
    index=pd.date_range("2020", periods=5, freq="D"),
    data=pd.date_range("2020", periods=5, freq="D"),
    name="date"
)
index = s.index
index_aware = index.tz_localize("europe/london")
print(index.freq)  
print(index_aware.freq)  
index_aware.freq = index_aware.inferred_freq
print(index_aware.freq)  

Prints:

<Day>
None
<Day>

No clue how robust this is, so it would be great if the freq can just be passed when applying tz_convert.