open-connectome-classes / StatConn-Spring-2015-Info

introductory material
18 stars 4 forks source link

Graphs of Different Sizes for Similar Data #22

Closed gkiar closed 9 years ago

gkiar commented 9 years ago

Though graph statistics are largely unknown, as you've said, simple difference metrics (i.e. Frobenius norm of the difference of two known graphs) can be computed when the graphs are identical in node structure. In the case that nodes were not exactly matching, do any numerical methods exist to compute similarity?

jovo commented 9 years ago

what is the problem with comparing adjacency matrices directly, even when nodes are perfectly matched? hint: think bias/variance trade-off...

On Sun, Feb 1, 2015 at 11:10 PM, Greg Kiar notifications@github.com wrote:

Though graph statistics are largely unknown, as you've said, simple difference metrics (i.e. Frobenius norm of the difference of two known graphs) can be computed when the graphs are identical in node structure. In the case that nodes were not exactly matching, do any numerical methods exist to compute similarity?

— Reply to this email directly or view it on GitHub https://github.com/Statistical-Connectomics-Sp15/intro/issues/22.

it's our dream openconnecto.me, we're hiring! https://docs.google.com/document/d/14SApYAzxF0Ddqg2ZCEwjmz3ht2TDhDmxyZI2ZP82_0U/edit?usp=sharing , jovo.me, my calendar https://www.google.com/calendar/embed?src=joshuav%40gmail.com&ctz=America/New_York

gkiar commented 9 years ago

What I understand about bias-variance tradeoff, in a machine learning context, is when you're modeling data in some way you can use a higher order function (more parameters) to decrease the bias to your data, but this will also increase your variance. The 'dream' is to "both accurately capture the regularities in training data, but also generalize well to unseen data"[wikipedia], which means that some balance has to be struck.

Though I'm not confident in how this applies to what I was just talking about with adjacency matrices, could it be that since we're using an unparameterized metric like Frobenius norm, we have a high bias in our result (specifically towards edges existing as opposed to not existing)? And with such a bias/variance, classification tasks would be difficult to distinguish unless a parameterized statistic is introduced?

jovo commented 9 years ago

check out my shuffled theory paper and Carey's non parametric testing paper. both on arxiv

On Tuesday, February 3, 2015, Greg Kiar notifications@github.com wrote:

What I understand about bias-variance tradeoff, in a machine learning context, is when you're modeling data in some way you can use a higher order function (more parameters) to decrease the bias to your data, but this will also increase your variance. The 'dream' is to "both accurately capture the regularities in training data, but also generalize well to unseen data"[wikipedia], which means that some balance has to be struck.

Though I'm not confident in how this applies to what I was just talking about with adjacency matrices, could it be that since we're using an unparameterized metric like Frobenius norm, we have a high bias in our result (specifically towards edges existing as opposed to not existing)? And with such a bias/variance, classification tasks would be difficult to distinguish unless a parameterized statistic is introduced?

— Reply to this email directly or view it on GitHub https://github.com/Statistical-Connectomics-Sp15/intro/issues/22#issuecomment-72656184 .

the glass is all full: half water, half air. openconnecto.me, we're hiring! https://docs.google.com/document/d/14SApYAzxF0Ddqg2ZCEwjmz3ht2TDhDmxyZI2ZP82_0U/edit?usp=sharing , jovo.me, my calendar https://www.google.com/calendar/embed?src=joshuav%40gmail.com&ctz=America/New_York