openSNP / snpr

The sources of the openSNP website
http://opensnp.org
MIT License
170 stars 46 forks source link

ubiome support #179

Closed audy closed 8 years ago

audy commented 9 years ago

Have you, the snpr core, thought about adding support for importing data from ubiome? I thought about making an "opensnp for ubiome" but realized it's probably better to just add ubiome support to snpr. That way all of your quantified self data is in the same place, which could allow for interesting comparison.

If you do, I would like to help. Ubiome (afaik) doesn't provide you with the processed microbiome data, only reads. So there would need to be a little sequence analysis performed on the snpr side. I've dealt with ubiome data in the past and have some scripts for going from sequences -> taxa abundance table. The data for one sample are not very large, so it is possible that the sequence analysis could be conducted on a small to medium cloud machine.

gedankenstuecke commented 9 years ago

Hey, that's a great idea. I thought about it back when I got my first uBiome data, but at the time they didn't offer a standardized way of downloading the raw reads, but only the taxonomy counts. And I think they are still offering those, e.g. one of my sets can be downloaded here https://app.ubiome.com/api/getRawTaxaData?ssr=4483 (might only work after being logged in, in that case replace the ssr with one of your own).

And speaking of quantified self: If you're interested you could also take a look at the Jawbone API implementation I started at https://github.com/gedankenstuecke/snpr/tree/jawbone.

What does the rest think? :-)

tsujigiri commented 9 years ago

As long as there will be tests, I'm fine with it. ;)

I'm not sure if this doesn't go beyond the scope of what openSNP is, though. We are only three developers with limited time and already struggle with maintaining all the existing features (of course, more tests would help with that). Also, a "small to medium cloud machine" has to be paid for, too.

gedankenstuecke commented 9 years ago

I think if we allow people to just upload the fastq & taxonomic counts precomputed by uBiome we should have no more load on our end than we would get from a single 23andMe file (on the opposite, rather less I'd assume).

And I just assumed that @audy was willing to write tests for the feature as well (and maybe even contribute more ;-)).

audy commented 9 years ago

Those are fine points @tsujigiri. Maybe I will eventually get around to making an openbiome app. If I did, I would create an API and let opensnp use it :)

philippbayer commented 9 years ago

A second app sounds like the best solution to me too :)

gedankenstuecke commented 9 years ago

Personally I feel that we're halfway there in having support for this kind of files already. After all we do have picture phenotypes and time series phenotypes with Fitbit. The fastq files would be ~8 MB per microbiome sample (c.f. my data) and the taxon counts are really small and come already as json (see my data too).

But that may just be my wish for having large scale genotype + microbiome data to correlate the two data sets, which could be really cool. :smile: :chart_with_upwards_trend:

audy commented 9 years ago

If ubiome (and others) make it easy to export taxonomic counts then adding microbiome support to snpr/opensnp be as trivial as adding data from something like Fitbit. But if ubiome support exporting only sequence data, then it is probably better to create a separate web app for converting sequencing data to taxonomic counts which could then be imported into opensnp.

Analyzing microbiome sequencing data is not trivial. For example, the reference databases are constantly changing and there is no consensus on how to actually map the reads to the databases or whether or not to even use a reference database. This functionality is probably out of opensnp's scope. I think of opensnp as a platform for aggregating "quantified-self" data so that it can be used for large scale genotype + x data comparison.

I will think about creating a web app which allows users to upload the fastq files and get taxonomic counts. Does anyone have any suggestions regarding passing some sort of ethics review before developing such an app?

gedankenstuecke commented 9 years ago

So actually I was not thinking of performing any analysis live on openSNP with the fastqs but rather having them there as unprocessed raw data and maybe viz the counts generated by uBiome. Because I agree, going from fastq to counts isn't fun to do, as the stuff changes all the time and it would be outside the scope.

For the ethics review: This is another reason why I'd be reluctant to go from fastq -> counts on our end instead of just displaying the counts generated by uBiome. We basically bypassed the IRB process with openSNP and are somewhat justified to do so, because we're not running any analysis on openSNP but instead just distribute & visualize data that's already there. :stuck_out_tongue_winking_eye:

gedankenstuecke commented 9 years ago

But sorry, to answer your question: No suggestions from our/my side. We tried getting IRB approval for openSNP but basically every IRB we contacted said "well, you're running a project in your spare time, it's not affiliated with this institute, so don't bother us". The situation would be slightly different in the US I'd guess, as you're having commercial IRBs over there.

philippbayer commented 9 years ago

My IRB also doesn't look at projects after they've already started as a general rule, could be in more places?

gedankenstuecke commented 9 years ago

You might have seen that uBiome now started their own "open data collection", which is just a GH repo for people to push to. Maybe there's some way to collaborate to make it more useful?

gedankenstuecke commented 8 years ago

Ok, as there's no interest from them (and not too much from our side) I guess I'll close it for now :-)

philippbayer commented 8 years ago

ah well :)