whitlock / OutFLANK

A procedure to find Fst outliers based on an inferred distribution of neutral Fst
18 stars 9 forks source link

Converting my VCF data to use Outflank #16

Open taniachavarria79 opened 6 years ago

taniachavarria79 commented 6 years ago

Hello, I'm trying to convert my VCF data to an outflank format. I'm following your tutorial Kattie but it give me this message : "Warning message: In convertVCFtoCount3(all.vcf.gen) : NAs introduced by coercion" I do not know what I'm doing wrong. I do really appreciate your help!

I adjunct my VCF data here. I have to adjunct in a txt file because does not accept vcf files. In addition, batch_1newdata.txt

this is the tutorial that I'm using:

vcf <- read.vcfR("../sim1a/vcf_sim1a_contest.vcf.gz", verbose=FALSE) ##############################

##############################

Convert VCF format to SNP data format required by OutFLANK (Note that this is slow)

############################## convertVCFtoCount3 <- function(string){

This function assumes 0 for reference

# and 1 for alternate allele
a <- as.numeric(unlist(strsplit(string, split = c("[|///]"))))
odd = seq(1, length(a), by=2)
a[odd] + a[odd+1]

}

all.vcf.gen <- vcf@gt[,-1] system.time(gen_table <- matrix(convertVCFtoCount3(all.vcf.gen), ncol=ncol(all.vcf.gen)))

DrK-Lo commented 6 years ago

The code provided with the vignette only works for VCF files with two alleles and no missing data. If you figure out a script that works for your VCF file feel free to share it in this thread.

On Sep 18, 2018, at 7:59 PM, taniachavarria79 notifications@github.com<mailto:notifications@github.com> wrote:

Hello, I'm trying to convert my VCF data to an outflank format. I'm following your tutorial Kattie but it give me this message : "Warning message: In convertVCFtoCount3(all.vcf.gen) : NAs introduced by coercion" I do not know what I'm doing wrong. I do really appreciate your help!

I adjunct my VCF data here. In addition, batch_1newdata.txthttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fwhitlock%2FOutFLANK%2Ffiles%2F2394889%2Fbatch_1newdata.txt&data=02%7C01%7Ck.lotterhos%40northeastern.edu%7C25387f593dda48f3c95808d61dc2b7cf%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C636729119567413654&sdata=APpHSMFIptNlNXDaHyyxFvaxa5zjQKLfLCkV3srLBLE%3D&reserved=0

this is the tutorial that I'm using:

vcf <- read.vcfR("../sim1a/vcf_sim1a_contest.vcf.gz", verbose=FALSE) ##############################

##############################

Convert VCF format to SNP data format required by OutFLANK (Note that this is slow)

############################## convertVCFtoCount3 <- function(string){

This function assumes 0 for reference

and 1 for alternate allele

a <- as.numeric(unlist(strsplit(string, split = c("[|///]")))) odd = seq(1, length(a), by=2) a[odd] + a[odd+1] }

all.vcf.gen <- vcf@gt[,-1] system.time(gen_table <- matrix(convertVCFtoCount3(all.vcf.gen), ncol=ncol(all.vcf.gen)))

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fwhitlock%2FOutFLANK%2Fissues%2F16&data=02%7C01%7Ck.lotterhos%40northeastern.edu%7C25387f593dda48f3c95808d61dc2b7cf%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C636729119567423667&sdata=NBcb3gXwHgGOxKolap%2FTpC8cVv8R%2FLDu%2BYyWTsg1K0I%3D&reserved=0, or mute the threadhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAGjUbVTwr2G2_m0d5ZQpUQJPxL9FVJ7wks5ucYjJgaJpZM4WvLNo&data=02%7C01%7Ck.lotterhos%40northeastern.edu%7C25387f593dda48f3c95808d61dc2b7cf%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C636729119567433669&sdata=dL9P8QaaeGYbubbMkSC36DcbyezmcormoXg456DQEsM%3D&reserved=0.