open-connectome-classes / StatConn-Spring-2015-Info

introductory material
18 stars 4 forks source link

features in graphs #42

Open dhadjia1 opened 9 years ago

dhadjia1 commented 9 years ago

So we learned the other day that there are an exponential number of features for a given graph. Suppose we had a way of taking two graphs and comparing every feature in a reasonable time. Can we say only when EVERY feature exhibits statistical difference that the graphs are different? Even if one feature was not statistically significant would this not allow us to state the graphs are different?

akim1 commented 9 years ago

We can assume knowledge of every feature of a graph by imposing an assumption like the one introduced in the class (e.g., Bernoulli IID case). So in this sense, I think knowledge of EVERY feature of a graph can always be simplified to a model with reasonable number of parameters. Whether this model is actually valid or true is a different question.

Let's say: Graph A has 5 edges and 2 nodes. Graph B has 10 edges and 2 nodes. Let's describe these graphs with two features: number of edges and number of nodes. The number of nodes (our first feature) between Graph A and Graph B are the same, but the number of edges (our second feature) between Graph A and Graph B are completely different. If we use the definition of equivalency explored in class, I would be hard-pressed to come up with a case where Graph A and Graph B are the same with those features. In this respect, a statistically significant different feature would allow us to conclude that the two graphs are different.

On the other hand if both Graph A and B has 7 edges and 5 nodes, this doesn't necessarily imply that Graph A and Graph B are the same graph, because we can draw the same number of edges in multiple ways that would give us different graphs. A feature that is not statistically different would not allow us to conclude that the two graphs are the same.

So I guess it really boils down to how meaningful your feature is and how you define the equivalency of graphs.

mrjiaruiwang commented 9 years ago

You could set up a multiple hypothesis testing scheme. So imagine each comparison being a hypothesis that is tested that outputs a p-value. You can then use various correction methods to control the family-wise error rate or false positives rate. So for example, if you want to make sure that every comparison is different, you would want to pick a cutoff that would allow you to reject all the null hypothesis.