gregversteeg / bio_corex

A flexible version of CorEx developed for bio-data challenges that handles missing data, continuous/discrete variables, multi-CPU, overlapping structure, and includes visualizations
Apache License 2.0
137 stars 30 forks source link

penwidth #18

Open vangarav opened 4 years ago

vangarav commented 4 years ago

Hi Greg, I am new to this technique and want to know how if there is any relation between penwidth and the corex.mis. Does penwidth represents the amount of mutual info shared by the variable with the corresponding latent factor. Also I am getting 2 excel sheets weights and mis files for each layer.What is the difference between them. If i am not wrong, the mis file says the mutual info of each variable corresponding to the each latent factor in that layer. What does weight represents. I know, I have to learn a lot to understand the paper but got stuck on these issues. please help me.Thanks.

gregversteeg commented 4 years ago

Weights is a number between zero and one, which indicates if a factor Z_i is connected to a variable X_j. We have another number, mutual information, which estimates I(Z_i, X_j). I believe the penwidth is the product of these two numbers, with some thresholding to get good looking graphs. It is possible to have high mutual information but not connect X_j to Z_i. This could be because Z_i provides no unique information about X_j which is not already explained by one of X_j's other parents.