timothyfrasier / related

8 stars 7 forks source link

VCF file to related format #3

Open abcosta opened 5 years ago

abcosta commented 5 years ago

Dear Dr Frasier,

Would be possible to recommend me a way to convert a vcf file to the format accepted by the package? I tried either "write_related" or "genomic_converter" using the R package radiator but none of them worked.

Thank you

timothyfrasier commented 5 years ago

Hi:

I apologize for the slow reply. I do not have experience with this myself, but below is what some other people have recommended that have run into the same issue.

Sincerely, Tim Frasier

You can do this within R by using the pegas package:

library(pegas)

reads into an object of class loci (pegas)

vcf <- read.vcf('path/to/vcf')

convert to genind object (adegenet)

gen <- loci2genind(vcf)

extract genotype data from genind object as data.frame

dat <- as.data.frame(gen@tab)

add a column for individuals

dat <- cbind(indiv=rownames(dat), dat)

make sure that column is as.character and not as.factor

dat[[1]] <- as.character(dat[[1]])

After that, your SNP genotype data.frame should be readable by coancestry:

relatedness <- coancestry(dat, wang=1)


Timothy R. Frasier Coordinator - Forensic Sciences Program Associate Professor - Department of Biology Saint Mary's University 923 Robie Street Halifax, NS B3H 3C3, Canada Email: timothy.frasier@smu.camailto:timothy.frasier@smu.ca Tel: (902) 491-6382 www.frasierlab.cahttp://frasierlab.ca

On Jan 30, 2019, at 7:49 AM, abcosta notifications@github.com<mailto:notifications@github.com> wrote:

Dear Dr Frasier,

Would be possible to recommend me a way to convert a vcf file to the format accepted by the package? I tried either "write_related" or "genomic_converter" using the R package radiator but none of them worked.

Thank you

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/timothyfrasier/related/issues/3, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ADbcWV3ZTEhpBY_eoHov1XPiPqzXn1x0ks5vIYbNgaJpZM4aaClM.

abcosta commented 5 years ago

Hi,

Thank you very much! I will try it.

All the best!!

tjraszick commented 5 years ago

Did this work? I am about to try it for myself. Is there a similar solution for generating the allele frequency input?

Thanks!

abcosta commented 5 years ago

Hi,

That script didn't work with my data set. Seems that there's a problem with "loci2genind". I wasn't able to fix it. But I used other R packages to convert the VCF file to genind object. You can use both vcfR and dartR packages. read with vcfR: read.vcfR; convert with dartR: vcfR2genind (vcfR package) or gl2gi (dartR package - it converts from a genlight object to genind). I was able to run related using this other script. However I'm working with 40,000 SNPs and I got values equal or higher than 1. I was just trying the code and didn't work with it anymore. Thus these results need to be analyzed more thoroughly. Let me know if it worked for your data set.

tjraszick commented 5 years ago

Other than crashing my R when trying to view the data, the above script worked. I exported the data using the write.table function and with the col.names and row.names parameters set to FALSE. My one concern is that many of my SNP alleles wound up coded as 0, which coancestry recognizes as missing data in microsatellite datasets. Hopefully, 0 values are regarded differently in SNP datasets. I am running coancestry now to see what I get, but it is not scheduled to finishing running until Sunday.

YannDussert commented 4 years ago

Hi,

Old issue, but I recently started using related, so here is "my" solution.

First, I am not sure using a genind object really resolves the problem, since a 0/0 genotype will be coded as "0 2" and 1/1 as "2 0".

The following script is modified from a bit of code written by Katie Lotterhos, so all credit goes to here, and any errors are probably mine. The original code can be found at https://github.com/TestTheTests/TTT_RecombinationGenomeScans/blob/master/src/b_Proc_Sims.R

library(vcfR) library(related)

vcf = read.vcfR('my_snp_data.vcf')

geno = extract.gt(vcf) G_relate = matrix(NA, nrow = ncol(geno), ncol = nrow(geno)*2) abob = t(substr(geno, start = 1, stop = 1)) bbob = t(substr(geno, start = 3, stop = 3)) odd = seq(1, ncol(G_relate), by = 2) G_relate[, odd] = abob G_relate[, odd + 1] = bbob rownames(G_relate) = rownames(abob) colnames(G_relate) = rep(colnames(abob), each = 2) class(G_relate) = "numeric" G_relate = data.frame(Individual = rownames(G_relate), G_relate) G_relate$Individual = as.character(G_relate$Individual) G_relate[, 2:ncol(G_relate)] = G_relate[, 2:ncol(G_relate)] + 1 #to avoid 0s in data

Check if good format

t(geno[1:5,1:5]) G_relate[1:5,1:10]

Compute relatedness indices

relat_fullDat = coancestry(G_relate, lynchli=1, lynchrd=1, quellergt=1, ritland=1, wang=1)

SNPs have an ID in my vcf file. If it's not the case for your data, you'll probably have to modify the code to have proper column names (loci). Do not hesitate to tell me if you found an error in this code!

Best regards, Yann

jdalapicolla commented 3 years ago

Hi,

I'm trying to use the coancestry function and I have a similar problem. My input from vcf or genind didn't work in this function. When I followed @YannDussert I got the following error :

relat_fullDat = coancestry(G_relate2, lynchli=1, lynchrd=1, quellergt=1, ritland=1, wang=1)
Error in `$<-.data.frame`(`*tmp*`, "allele", value = 1:2) : 
  replacement has 2 rows, data has 3

When I did not convert the 0's in this following line, the function works

G_relate[, 2:ncol(G_relate)] = G_relate[, 2:ncol(G_relate)] + 1 #to avoid 0s in data

But in this way, the function is considering 0 as missing data, right? Does anyone have another tip?