googlegenomics / bigquery-examples

Advanced BigQuery examples on genomic data.
Apache License 2.0
89 stars 31 forks source link

Add a schema transformation sample using Hadoop Streaming and the BigQuery Connector #21

Closed deflaux closed 9 years ago

deflaux commented 10 years ago

The sample should demonstrate something to make the schema less complicated such as changing the call.genotype repeated field to instead be two fields call.first_allele and call.second_allele. (But be sure to add a check and fail the job if the data has any triploid genotypes).

See Google Cloud Platform Hadoop release announcement for details.

deflaux commented 9 years ago

We wound up with two similar jobs but they are in the codelabs repository: