karolinachalupova / DiplomaThesis

Explaining Equity Returns with Interpretable Machine Learning
1 stars 0 forks source link

Correlated features problems #5

Closed karolinachalupova closed 3 years ago

karolinachalupova commented 4 years ago
martinhronec commented 4 years ago
barunik commented 4 years ago

with characteristics I would use clustering, or maybe even simple PCA picking up a component or "factor" which explains most of teh variance of the correlated characteristics, although this might be bit more complicated and may even go against teh goal of "interpretable" model

karolinachalupova commented 4 years ago

This problem may actually diminish somewhat if I just use 30 most important anomalies from Kelly (as per #6 ). I will post their correlation matrix here soon.

karolinachalupova commented 3 years ago

According to this correlation plot (size and color both represent correlation sign and strength), the correlation problem does not seem to be too bad. Top 30 characteristics from OTMH shown.

corrplot.pdf

karolinachalupova commented 3 years ago

I found a paper that is super interesting and seems to solve the issue. https://arxiv.org/abs/1801.01489 But it is very technical and I cannot find any implementations exept the authors R code thats reeeealy messy. Citing:

"However, existing VI measures do not generally account for the fact that many prediction models may t the data almost equally well. In such cases, the model used by one analyst may rely on entirely di erent covariate information than the model used by another analyst. This common scenario has been called the \Rashomon" e ect of statistics (Breiman et al., 2001; see also Lecue, 2011; Statnikov et al., 2013; Tulabandhula and Rudin, 2014; Nevo and Ritov, 2017; Letham et al., 2016). The term is inspired by the 1950 Kurosawa lm of the same name, in which four witnesses o er di erent descriptions and explanations for the same encounter. Under the Rashomon e ect, how should analysts give comprehensive descriptions of the importance of each covariate? How well can one analyst recover the conclusions of another? Will the model that gives the best predictions necessarily give the most accurate interpretation? To address these concerns, we analyze the set of prediction models that provide nearoptimal accuracy, which we refer to as a Rashomon set. This approach stands in contrast to training to select a single prediction model, among a prespeci ed class of candidate models."

"Applying this approach to study variable importance, we de ne model class reliance (MCR) as the highest and lowest degree to which any well-performing model within a given class may rely on a variable of interest for prediction accuracy. Roughly speaking, MCR captures the range of explanations, or mechanisms, associated with well-performing models. Because the resulting range summarizes many prediction models simultaneously, rather a single model, we expect this range to be less a ected by the choices that an individual analyst makes during the model- tting process."

karolinachalupova commented 3 years ago

More from Fisher: "Applying this approach to study variable importance, we de ne model class reliance (MCR) as the highest and lowest degree to which any well-performing model within a given class may rely on a variable of interest for prediction accuracy. Roughly speaking, MCR captures the range of explanations, or mechanisms, associated with well-performing models. Because the resulting range summarizes many prediction models simultaneously, rather a single model, we expect this range to be less a ected by the choices that an individual analyst makes during the model- tting process."

karolinachalupova commented 3 years ago

I am opening a separate issue to discuss the paper. #8

karolinachalupova commented 3 years ago

I have very interesting results from calculating the interpretation for different random seeds - for some features, the interpretation remains and for some it changes and other feature picks up the effect. I have it on todo list to show how this is related to feature correlation, should be an interesting result to have.