Open rlaker opened 7 months ago
Thanks for this @rlaker.
I agree with the second code snippet - date < treatment_time
should be classed as pre, and date >= treatment_time
should be classed as post.
It could even be worth adding some input validation and add include that under test coverage. Did you want to have a go at this and submit a PR?
Though it could be worth thinking if we want treatment always defined with >=
, or if there will be some occasions where a user might want >
instead.
Happy to have a go!
Perhaps the confusion is that the example dataframe defined its own pre
column, which was then ignored by the PrePostFit
class which used treatment_time
instead. Maybe the user should create the pre
column and the class should do
self.datapre = data[data['pre'] == True]
self.datapost = data[data['pre'] == False]
?
In the COVID excess deaths example, the data is split into pre and post treatment with
However, when inspecting result.datapre we see that 2020-01-01 is included with the label
pre=False
If 2020-01-01 should be in the post set, as the df says, then the splitting should be changed to:
If it should be in the pre set, then
deaths_and_temps_england_wales.csv
needs to be updated so thatpre=True
for this date