Closed rahulbshrestha closed 4 weeks ago
Thanks for starting this, @rahulbshrestha . Let us know once the PR is ready for review.
I ran some tests to check if the values of att and atc are the same before and after changes made in this PR:
### PREVIOUS IMPLEMENTATION
att = 0
numtreatedunits = treated.shape[0]
treated_outcomes_old = []
control_outcomes_old = []
for i in range(numtreatedunits):
treated_outcome = treated.iloc[i][self._target_estimand.outcome_variable[0]].item()
control_outcome = control.iloc[indices[i]][self._target_estimand.outcome_variable[0]].item()
treated_outcomes_old.append(treated_outcome)
control_outcomes_old.append(control_outcome)
att += treated_outcome - control_outcome
att /= numtreatedunits
print('Checking values of ATT: ')
print('ATT (before): ', att)
outcome_variable = self._target_estimand.outcome_variable[0]
treated_outcomes = treated[outcome_variable]
control_outcomes = list(control.iloc[indices.flatten()][outcome_variable])
att = (treated_outcomes - control_outcomes).mean()
print('ATT (after): ', att)
print('Treated outcomes ', treated_outcomes_old == treated_outcomes)
print('Control outcomes', control_outcomes_old == control_outcomes)
and the results when running on some test data:
Checking values of ATT:
ATT (before): 10.923190922091228
ATT (after): 10.923190922091242
Treated outcomes True
Control outcomes True
Checking values of ATC:
ATC (before): 10.506587873468016
ATC (after): 10.506587873468012
Treated outcomes True
Control outcomes True
Both lists, treated outcomes and control outcomes are the same before and after the changes I made. The ATT and ATC seems to be off by a couple digits after averaging (check last 3 digits in the example above), which is probably a rounding error. Is this a problem @amit-sharma?
Hey @amit-sharma! I think this PR is ready to be merged :)
@all-contributors please add @rahulbshrestha for code.
@amit-sharma
I've put up a pull request to add @rahulbshrestha! :tada:
This PR addresses this issue by introducing vectorized operations instead of the existing for-loops. This should speed up operations for large datasets.
This PR is a work in progress, and the remaining tasks include: