Open buhrmann opened 5 years ago
Hi, sorry for the delay in responding to this. One of the great things about the information-theoretic formulation is that it does make sense to put information about a continuous variable and information about a discrete variable on the same footing. However, you're right that the current implementations don't allow mixing, and I don't have plans to implement that.
If your main interest is mixing continuous variables and binary variables, then I recommend using CorEx in continuous mode (with -c option from command line), and encoding the binary variable with any two values (0/1, or -1/+1, e.g.). The way the marginal probabilities are modeled in this case (with mixtures of Gaussians around each binary value) should be equivalent to modeling them as binary. However, if your categorical variables take more than two values, say X_i = "cat", "dog", "bird", and you encode those as X_i=0,1,2, then you lose some of the meaning of the categorical formulation because in the continuous formulation "2" is closer to "1" than it is to "0" (according to the Gaussian mixture model that we use to model, that is), but this is not really true for our original categorical variables.
Hi, in one of your papers it is mentioned that in principle CorEx works with heterogeneous data types, but it seems that the current implementation only works for all continuous or all discrete data matrices. If that's correct, do you plan to support mixed continuous and categorical types in the future?