Closed rgiordan closed 2 years ago
I think the problem comes down to organizing the influence scores by "sign" and "significance." In hindsight, this was quite a bad design decision. My first instinct is to re-design so that the influence scores list contains named entries for each sorted metric, which, by default, would be "beta", "beta_mzse", and "beta_pzse". We can then get rid of the "pos" and "neg" elements and just use the first or last entries of the list instead.
A solution (and major refactor) is in progress on the refactor
branch.
@tinnguyen96 writes:
Based on lines 135–138 from regression_sensitivity_lib.R, to induce a “sign and significance” change in a negative, statistically insignificant estimate, it is sufficient to change the value of the “beta mzse est” function (posterior mean minus two posterior standard deviations). Furthermore, the direction of the change should be “pos”, since we need to increase the value of “beta mzse est”. Conceptually, we need to select observations with the smallest values of “beta mzse grad”.
In the implementation, for this situation,
GetRegressionTargetChange()
actually select ob- servations with the smallest values of “beta pzse grad”. Based on lines 166 and 145 from regression_sensitivity_lib.R, if influence dfs is the output ofSortAndAccumulate()
, thenGetRegressionTargetChange()
will use influencedfs$sig$pos
to select the observations. However, the observations ranking in influence dfs$sig$pos is based on increasing beta pzse grad rather than increasing beta mzse grad. For evidence of this, look at line 139 from sorting_lib.R.