hackseq / hackseq_projects_2016

8 stars 2 forks source link

Project 9: Selection of tag SNPs for an African SNP array by LD and haplotype based methods #2

Open ttimbers opened 8 years ago

ttimbers commented 8 years ago

Project: The genetic diversity in Africa is immense. The diversity across the entire continent has not previously been captured on any commercial SNP array to date. Developing a cost-efficient and representative genotype array with SNPs that provide good coverage across the African continent is key to conducting large-scale medical genetic studies in Africa I propose a project in which we write a SNP selection algorithm applicable to whole genome sequence (WGS) data. This will involve writing an algorithm that chooses SNPs that tag other SNPs most efficiently across individuals from several African populations. This will be applied in conjunction with a commercial lists of pre-approved SNPs, lists of SNPs of general interest and various lists with ranking of SNPs in order to select a set of tag SNPs that can be put on a commercial SNP array. We intend to write the code in Python. Other tag SNP selection algorithms exist, but none of these are geared towards handling WGS data efficiently. By making use of random access to block gzipped files we intend to write a memory efficient algorithm applicable to WGS data. We envision this algorithm being used in combination with existing imputation methods to make use of haplotype (multi-marker) tagging in addition to simple pairwise LD. We aim to provide a fully functional piece of software and a list of tag SNPs at the end of hackseq.

Project Lead: Tommy Carstensen / @tommycarstensen / Bioinformatician / Wellcome Trust Sanger Institute

sjackman commented 8 years ago

We're planning to have a Docker image with a bunch of bioinformatics software preinstalled running on machines at the BC Cancer Agency Genome Sciences Centre during the Hackathon. Which bioinformatics software do you plant to use for your project? In particular, is there any software that you plan to use that is not already listed here? http://www.bcgsc.ca/services/orca

tommycarstensen commented 8 years ago

@sjackman Can you add IMPUTE2 to that list?

sjackman commented 8 years ago

Is the source code for IMPUTE available? I'm not able to find it here: https://mathgen.stats.ox.ac.uk/impute/impute_v2.html#download The above software is being installed using Homebrew, which requires that the source code be available.

mp15 commented 8 years ago

@sjackman The source code for IMPUTE2 is not available alas

sjackman commented 8 years ago

@tseemann Any interest in adding IMPUTE2 to https://github.com/tseemann/homebrew-bioinformatics-linux ?