igvteam / igv.js

Embeddable genomic visualization component based on the Integrative Genomics Viewer
MIT License
641 stars 229 forks source link

interface to parquet formatted VCFs/BAMs in S3 buckets? #349

Closed vanetten closed 6 years ago

vanetten commented 7 years ago

Jim asked me to post this...

"I have been playing with Amazon’s Athena service (familiar with it?). I have be using it to provide a SQL interface to parquet formatted VCF files that are stored in S3 buckets. Details quoted below. I haven’t tried BAM files yet, but “they” say it works?

Can you tell me if igv-web can provide an interface to parquet formatted BAMs and VCFs?"

Athena (http://docs.aws.amazon.com/athena/latest/ug/getting-started.html) o Provides nearly zero effort, fast, inexpensive SQL queries on pretty much any object you can drop in an AWS S3 bucket. o ADAM (https://github.com/bigdatagenomics/adam) is a tool that converts VCFs and BAMs to Parquet format (among many other things) o Apache Parquet (http://parquet.apache.org) is a columnar storage format available to any project in the Hadoop ecosystem

Bill

jrobinso commented 7 years ago

Bill, is there a VCF or BAM file already setup for Athena that I can query via a restful API? Its not crystal clear to me what igv.js is going to be querying.

jrobinso commented 6 years ago

Closing this as its old with no action. Its doubtful there will be a javascript interface in igv.js. We can however easily connect to any reasonable webservice.