I see that this issue is closed, and there is an argument in the main text of the book that "The idea here is that the model must be estimated with the entire data, under the postulated null hypothesis, to avoid huge post intervention residuals.", which was probably added as a reaction to the abovementioned issue.
Still, using the whole data strikes me as wrong, because we later use a test statistic that only focuses on the post-intervention period. However, if the data in the post-intervention period fits badly, this will not only lead to large residuals in the post-intervention period, but it will also lead to enlarged residuals in the pre-intervention period because of the model trying to accomodate the post-intervention data. Therefore I think we can't fit the model to the whole data and then only use the post-intervention period for the test statistic. So we should either fit the model only on the pre-intervention data, or use all rediduals for the test statistic. I'm not sure which of the two is the better option.
In issue https://github.com/matheusfacure/python-causality-handbook/issues/270 by @david26694, the question comes up whether it's correct to fit the counterfactual on the whole data instead of using only the training period.
I see that this issue is closed, and there is an argument in the main text of the book that "The idea here is that the model must be estimated with the entire data, under the postulated null hypothesis, to avoid huge post intervention residuals.", which was probably added as a reaction to the abovementioned issue.
Still, using the whole data strikes me as wrong, because we later use a test statistic that only focuses on the post-intervention period. However, if the data in the post-intervention period fits badly, this will not only lead to large residuals in the post-intervention period, but it will also lead to enlarged residuals in the pre-intervention period because of the model trying to accomodate the post-intervention data. Therefore I think we can't fit the model to the whole data and then only use the post-intervention period for the test statistic. So we should either fit the model only on the pre-intervention data, or use all rediduals for the test statistic. I'm not sure which of the two is the better option.