Error: Can only use .dt accessor with datetimelike values #2

Open shane-kercheval opened 1 year ago

shane-kercheval commented 1 year ago

Python 3.11; pandas 2.0.3; numpy 1.25.1

When running the first cell in Chapter 8 in Simulated-Data.ipynb, I got the following error in a few places: Can only use .dt accessor with datetimelike values

I was able to get it to work by adding pd.to_datetime in various places. I originally adding pd.to_datetime to the date column but that caused other issues. Not sure if there is a better way.

date = pd.date_range("2021-05-01", "2021-07-31", freq="D")
cohorts = pd.to_datetime(["2021-05-15", "2021-06-04", "2021-06-20"]).date
poss_regions = ["S", "N", "W", "E"]

reg_ps = dict(zip(poss_regions,    [.3, .6, .7, .8]))
reg_fe = dict(zip(poss_regions,    [20,  16,  8,  2]))
reg_trend = dict(zip(poss_regions, [0,  0.2,  .4,  .6]))

units = np.array(range(1, 200+1))


unit_reg = np.random.choice(poss_regions, len(units))
exp_trend = np.random.exponential(0.01, len(units))
treated_unit = np.random.binomial(1, np.vectorize(reg_ps.__getitem__)(unit_reg))

# staggered addopton dgp
df = pd.DataFrame(dict(
    date = np.tile(, len(units)),
    city = np.repeat(units, len(date)),
    region = np.repeat(unit_reg, len(date)),
    treated_unit = np.repeat(treated_unit, len(date)),
    cohort = np.repeat(np.random.choice(cohorts, len(units)), len(date)),
    eff_heter = np.repeat(np.random.exponential(1, size=len(units)), len(date)),

    unit_fe = np.repeat(np.random.normal(0, 2, size=len(units)), len(date)),
    time_fe = np.tile(np.random.normal(size=len(date)), len(units)),
    week_day = np.tile(date.weekday, len(units)),
    w_seas = np.tile(abs(5-date.weekday) % 7, len(units)),
    reg_fe = lambda d: d["region"].map(reg_fe), 
    reg_trend = lambda d: d["region"].map(reg_trend), 
    reg_ps = lambda d: d["region"].map(reg_ps), 
    trend = lambda d: (pd.to_datetime(d["date"]) - pd.to_datetime(d["date"]).min()).dt.days,
    day = lambda d: (pd.to_datetime(d["date"]) - pd.to_datetime(d["date"]).min()).dt.days,
    cohort = lambda d: np.where(d["treated_unit"] == 1, d["cohort"], pd.to_datetime("2100-01-01")),
    treated = lambda d: ((pd.to_datetime(d["date"]) >= d["cohort"]) & d["treated_unit"] == 1).astype(int),
    y0 = lambda d: np.round(10 
                            + d["treated_unit"]
                            + d["reg_trend"]*d["trend"]/2
                            + d["unit_fe"] 
                            + 0.4*d["time_fe"] 
                            + 2*d["reg_fe"]
                            + d["w_seas"]/5, 0),
#     y0 = lambda d: np.round(d["y0"] + d.groupby("city")["y0"].shift(1).fillna(0)*0.2, 0)
    y1 = lambda d: d["y0"] + np.minimum(0.2*(np.maximum(0, (pd.to_datetime(d["date"]) - pd.to_datetime(d["cohort"])).dt.days)), 1)*d["eff_heter"]*2
    tau = lambda d: d["y1"] - d["y0"],
    downloads = lambda d: np.where(d["treated"] == 1, d["y1"], d["y0"]) + np.random.normal(0,.7,len(d)),
#     date = lambda d: pd.to_datetime(d["date"]),
).round({"downloads": 0})

# # # df.head()

and then in the second cell I had to change

.assign(post=lambda d: (d["date"] >= d["cohort"]).astype(int))


.assign(post=lambda d: (pd.to_datetime(d["date"]) >= d["cohort"]).astype(int))
shane-kercheval commented 1 year ago

Just a friendly suggestion; it would be handy for readers if the python version you used was specified in the readme and a requirements.txt was provided in the repo with specific package versions, to help ensure we can run the code without issues.

I'm guessing the code runs in your environment and so I assume the issue is caused by different pandas versions.