Closed karolinachalupova closed 3 years ago
If we still focus on anomalies/characteristics, this is indeed a problem. If we focus on "raw" variables from original data, it is less of a problem.
another option would be simple clustering maybe, instead of correlation matrix
Interesting alternative: try to fill missing values in certain characteristics/variables based on what values do firms with similar other characteristics/variables have there (supervised problem)
with characteristics I would use clustering, or maybe even simple PCA picking up a component or "factor" which explains most of teh variance of the correlated characteristics, although this might be bit more complicated and may even go against teh goal of "interpretable" model
This problem may actually diminish somewhat if I just use 30 most important anomalies from Kelly (as per #6 ). I will post their correlation matrix here soon.
According to this correlation plot (size and color both represent correlation sign and strength), the correlation problem does not seem to be too bad. Top 30 characteristics from OTMH shown.
I found a paper that is super interesting and seems to solve the issue. https://arxiv.org/abs/1801.01489 But it is very technical and I cannot find any implementations exept the authors R code thats reeeealy messy. Citing:
"However, existing VI measures do not generally account for the fact that many prediction models may t the data almost equally well. In such cases, the model used by one analyst may rely on entirely dierent covariate information than the model used by another analyst. This common scenario has been called the \Rashomon" eect of statistics (Breiman et al., 2001; see also Lecue, 2011; Statnikov et al., 2013; Tulabandhula and Rudin, 2014; Nevo and Ritov, 2017; Letham et al., 2016). The term is inspired by the 1950 Kurosawa lm of the same name, in which four witnesses oer dierent descriptions and explanations for the same encounter. Under the Rashomon eect, how should analysts give comprehensive descriptions of the importance of each covariate? How well can one analyst recover the conclusions of another? Will the model that gives the best predictions necessarily give the most accurate interpretation? To address these concerns, we analyze the set of prediction models that provide nearoptimal accuracy, which we refer to as a Rashomon set. This approach stands in contrast to training to select a single prediction model, among a prespecied class of candidate models."
"Applying this approach to study variable importance, we dene model class reliance (MCR) as the highest and lowest degree to which any well-performing model within a given class may rely on a variable of interest for prediction accuracy. Roughly speaking, MCR captures the range of explanations, or mechanisms, associated with well-performing models. Because the resulting range summarizes many prediction models simultaneously, rather a single model, we expect this range to be less aected by the choices that an individual analyst makes during the model-tting process."
More from Fisher: "Applying this approach to study variable importance, we dene model class reliance (MCR) as the highest and lowest degree to which any well-performing model within a given class may rely on a variable of interest for prediction accuracy. Roughly speaking, MCR captures the range of explanations, or mechanisms, associated with well-performing models. Because the resulting range summarizes many prediction models simultaneously, rather a single model, we expect this range to be less aected by the choices that an individual analyst makes during the model-tting process."
I am opening a separate issue to discuss the paper. #8
I have very interesting results from calculating the interpretation for different random seeds - for some features, the interpretation remains and for some it changes and other feature picks up the effect. I have it on todo list to show how this is related to feature correlation, should be an interesting result to have.