sunnypatel2141 / SnIPRE-Input

Create input file needed to run the SnIPRE program (http://med.stanford.edu/bustamantelab/software.html)
1 stars 2 forks source link

How is the outgroup incoperated in the inputs for the snipre_prep? #2

Open schnuffipc opened 8 years ago

schnuffipc commented 8 years ago

I couldn't find any reference to the outgroup other than 7) nout : outgroup x 2 (positive integer). Say I am using one outgroup, how is your program going to see it? Should it be as a .bam, should it be in the .vcf? Thanks in advance

sunnypatel2141 commented 8 years ago

Hello,

When you run snipre_prep.R script, argument # 7 is where you can provide the outgroup.

FYI: The following are the command-line arguments that must be provided with the snipre_prep.R script... 1) .vcf file 2) starting bee column number (positive integer) of the vcf file (1-based indexing) 3) ending bee column number (positive integer) of the vcf file (1-based indexing) 4) SnpEff file (for Format Specifications, see Part 2)
5) gff file (for Format Specifications, see Part 5) 6) Folder where output files from snipre_prep_bash.sh are located (path name) 7) nout : outgroup x 2 8) npop : population size x 2

Thanks, Sunny.

schnuffipc commented 8 years ago

Hi,

I have read thoroughly the list of inputs (it is basically the only existing documentation). But that doesn't help me to understand in what form the outgroup is given. Outgroup x 2 is not really telling me anything, and cannot possibly be enough for the program for its analysis. It is just a number. Where does it get the snp information of the outgroup? Is it in a .bam file? Is it a part of the .vcf file?

Thanks Pnina

eyal-privman commented 8 years ago

Hi Sunny,

Let me join in to try to explain why we're not understanding you.... We have whole genome sequencing data from a population sample of species A, and we want to use another closely related species B as the outgroup. We have mapped our population sequence data to the reference genome of species A, and we have a VCF file with the genotypes of SNPs in these individuals. We also have a single genome sequenced from species B. If we give the VCF to snipre_prep.R it will only contain the polymorphism information within species A. We need to somehow also supply the information regarding the differences between species A and species B. We don't understand how to give this information to your script.

My guess is that you used this script with population sequence data from species A that was mapped to the reference genome of species B. So the resulting VCF file contains both within species and between species information. Is that correct? This wasn't clear to us from your documentation.

Thanks, Eyal

sunnypatel2141 commented 8 years ago

Hello Eyan and Pnina,

As Eyal correctly figured out, my scripts WERE used with population data from species A mapped to reference genome of species B, and the VCF file reflected this. We had another VCF file for data from species B mapped to reference genome of species A. I am sorry that my documentation is not clear regarding this matter.

In my scripts, the outgroup (nout) and population size (npop) are just numbers, which I feed into the scripts written by Kirsten Eilertson (the actual SnIPRE program, available here: https://bustamantelab.stanford.edu/software).

Thank you, Sunny.