jeffreyevans / rfUtilities

R package for random forests model selection, inference, evaluation and validation
GNU General Public License v3.0
23 stars 11 forks source link

rf.effectSize for categorical variables #5

Closed DeFilippis closed 4 years ago

DeFilippis commented 5 years ago

Hello! I was wondering if there was any way to get effect sizes from categorical variables using rf.effectSize. I have a massive model with 312 covariates, many of which are factor variables,, and want to be able to quantify the feature importance and potential interactivity of these covariates in terms of interpretable effect sizes (as if it was a parametric regression). Is this possible?

jeffreyevans commented 5 years ago

Evan, No, there is no way to use this function to test for categorical effect size. In regard to a recursive partitioning model, you really are talking a combinatorics problem and not effect size per se. In a nonparametric model, such as random forests, I would go with the variable importance measure and not an effect size. If you really want to derive an effect size for your categorical factors you can use the Cramer’s-V statistic, as an extension of the psi coefficient: [V = sqrt(xi/n*df)] where; xi is the chi-square statistic, n is sample size, df = min(nrow – 1, ncol – 1) with (nrow/ncol) number of rows and columns of the contingency table. If you are dealing with sets of binominal variables then you can simply use the odds ratio. If I have time in the coming weeks I will write a function that derives various effect size metrics (psi, Cramer’s and odds ratio) for factorial variables and add it into the spatialEco package. I would really consider this a parametric effect and not necessarily indicative of the variable performance in a recursive partitioning type model.

Best, Jeff

Jeffrey S. Evans, Ph.D., | Senior Landscape Ecologist / Biometrician The Nature Conservancy | Global Lands Science Visiting Professor | University of Wyoming | Zoology & Physology Laramie, WY | jeffrey_evans@tnc.orgmailto:jeffrey_evans@tnc.org | (970) 672-6766<tel:(970)%20672-6766>

From: Evan DeFilippis notifications@github.com Sent: Thursday, July 18, 2019 3:25 PM To: jeffreyevans/rfUtilities rfUtilities@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [jeffreyevans/rfUtilities] rf.effectSize for categorical variables (#5)

Hello! I was wondering if there was any way to get effect sizes from categorical variables using rf.effectSize. I have a massive model with 312 covariates, many of which are factor variables,, and want to be able to quantify the feature importance and potential interactivity of these covariates in terms of interpretable effect sizes (as if it was a parametric regression). Is this possible?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/jeffreyevans/rfUtilities/issues/5?email_source=notifications&email_token=ACLKH7ZNPG6Z6NPI3GESVELQADNR3A5CNFSM4IE7ZAAKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HADSLLQ, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACLKH76T6QT4BWLZ3VHKMADQADNR3ANCNFSM4IE7ZAAA.

DeFilippis commented 5 years ago

Makes perfect sense. Thanks so much for clarifying. And if you ever find the time, the "effect size" feature would be greatly appreciated. Thanks for your work on this!

By the way, how exactly would you compute an odds ratio from a decision tree? Do you first generate a partial dependency plot and use average predictions from that?