rgiordan / zaminfluence

Tools in R for computing and using Z-estimator approximate influence functions.
Apache License 2.0
94 stars 10 forks source link

Sign and significance is sorting by the wrong quantity #20

Closed rgiordan closed 2 years ago

rgiordan commented 3 years ago

@tinnguyen96 writes:

Based on lines 135–138 from regression_sensitivity_lib.R, to induce a “sign and significance” change in a negative, statistically insignificant estimate, it is sufficient to change the value of the “beta mzse est” function (posterior mean minus two posterior standard deviations). Furthermore, the direction of the change should be “pos”, since we need to increase the value of “beta mzse est”. Conceptually, we need to select observations with the smallest values of “beta mzse grad”.

In the implementation, for this situation, GetRegressionTargetChange() actually select ob- servations with the smallest values of “beta pzse grad”. Based on lines 166 and 145 from regression_sensitivity_lib.R, if influence dfs is the output of SortAndAccumulate(), then GetRegressionTargetChange() will use influence dfs$sig$pos to select the observations. However, the observations ranking in influence dfs$sig$pos is based on increasing beta pzse grad rather than increasing beta mzse grad. For evidence of this, look at line 139 from sorting_lib.R.

rgiordan commented 3 years ago

I think the problem comes down to organizing the influence scores by "sign" and "significance." In hindsight, this was quite a bad design decision. My first instinct is to re-design so that the influence scores list contains named entries for each sorted metric, which, by default, would be "beta", "beta_mzse", and "beta_pzse". We can then get rid of the "pos" and "neg" elements and just use the first or last entries of the list instead.

rgiordan commented 3 years ago

A solution (and major refactor) is in progress on the refactor branch.

rgiordan commented 2 years ago

Fixed in https://github.com/rgiordan/zaminfluence/pull/22