Where is the bottleneck in all of this?

open-connectome-classes / StatConn-Spring-2015-Info

introductory material

18 stars 4 forks source link

Where is the bottleneck in all of this? #250

Open mrjiaruiwang opened 9 years ago

mrjiaruiwang commented 9 years ago

We have great work going on to try to turn noisy or high-variance data into "clean" data then we have a lot of people using this "clean" data to try to understand biology, but I wonder where the bottleneck is in all of this.

It seems like there is an entire field within Bioinformatics that would not exist if the data we've been collecting were better. And I understand there are a lot of people working on making better measurements, but since I'm not in that field, I don't really hear about what they do. But on a zoomed-out systems level, where is the bottleneck starting from data collection all the way to conclusion?

wrgr commented 9 years ago

Since nobody commented - let me try - I think in general right now, the bottleneck is in algorithms that translate collected data to knowledge.

For EM connectomics, this is largely algorithms to trace/segment neurons (and synapses), and a tremendous number of human hours have been spent trying to manually or automatically solve the problem. There isn't always a good feedback loop in neuroscience in general between algorithm people and imaging people and stats people. So it's hard for me (as an example) as a computer vision person to know enough and communicate with the biologists what we need to make data that is "good enough." We're getting closer and writing grants to try to make this a reality!

rgrohit commented 9 years ago

Many papers seem to deal with small set of data or high-variance data and I think that's where the bottleneck seems to be right now. Without 'clean' data, we need better and better algorithms to translate the biology into computational data. As the data gets cleaner, and there's more of it, it might get simpler to work with new algorithms.

akim1 commented 9 years ago

I don't know if it's a bottleneck, per se. Error contributions add up, right? Techs/experimenters don't perform measurements consistently, subject is fidgety, technology isn't quite up to par, etc.

10% error amplified across 10 different processes in the most pessismistic scenario is a 160% error (1.1^10).