open-connectome-classes / StatConn-Spring-2015-Info

introductory material
18 stars 4 forks source link

Does there exist a real way to beat the variance/bias tradeoff? #101

Open mrjiaruiwang opened 9 years ago

mrjiaruiwang commented 9 years ago

We know we can't beat the statistics by having both low variance and low bias, but what would be a way to get both lower variance and lower bias? Better experimental data may be the answer here, but I am unclear what this "better" thing is. Is it lower measuring variance? Is it something mathematical at all?

DSP137 commented 9 years ago

I don't know that there is a way to completely beat both, but I think the point was that while we may not be able to have the best of both worlds, we could at least optimize the problem. I would like to look into this more, but I think it may be a constrained optimization problem. Or perhaps, if we have a threshold for what the variance should be (say we don't want the variance to be any larger than M), then we can find out what the maximum bias we may have while maintaining this bound on the variance. Thoughts?

adjordan commented 9 years ago

Yeah, I think that answer is on point. I think JoVo mentioned that there are certain problems where we would want to have a set bias and then find maximum variance, but I can't think of examples of either off of the top of my head.

ghost commented 9 years ago

I interpreted the question slightly differently. I was reading it to mean that given that there is a bias-variance trade off, and can we do something else to lower both?

I mean...you always want the best data possible right? If your data is more close to the true distribution, for example if it has less noise it might be better in terms of when you do cross-validation testing. Sub-samples would be closer together so that would give you lower variance and bias. I'm not sure if that's what you mean by better experimental data? I guess then the answer to "is it something mathematical" might be, the way you could get better experimental data is through equipment, reducing human error, improving some lab technique, etc.

ElanHR commented 9 years ago

In addition to aceecc's response I think the best approach to trying to "beat both" is just to get a larger sample size. As your sample size grows (assuming some sort of independence) you can better trust your data to better represent the true distribution.

Another way of saying this is that larger sample sizes in themselves have lower variance (compared to smaller sizes sampled from the same distribution).

Answered something similar in another thread: https://github.com/Statistical-Connectomics-Sp15/intro/issues/107