thierrygosselin / radiator

RADseq Data Exploration, Manipulation and Visualization using R
https://thierrygosselin.github.io/radiator/
GNU General Public License v3.0
59 stars 23 forks source link

Importing genind object using genomic_converter #16

Closed Tom-Jenkins closed 6 years ago

Tom-Jenkins commented 6 years ago

Hi Thierry,

I've done some analysis in Adegenet. My Genind object contains 93 individuals, 9 populations and 6,386 loci:

/// GENIND OBJECT ///////// // 93 individuals; 6,386 loci; 12,772 alleles; size: 7.4 Mb // Basic content @tab: 93 x 12772 matrix of allele counts @loc.n.all: number of alleles per locus (range: 2-2) @loc.fac: locus factor for the 12772 columns of @tab @all.names: list of allele names for each locus @ploidy: ploidy of each individual (range: 2-2) @type: codom @call: adegenet::df2genind(X = t(x), sep = sep) // Optional content @pop: population of each individual (group size range: 8-18)

I want to use the genomic_converter function to convert and export the Genind object into genepop and bayescan files. My understanding of the function is that it accepts Genind data from the global environment, however, I keep getting the error below?

convert = genomic_converter(seafan, output=c("genepop","bayescan"))

####################################################################### ##################### radiator::genomic_converter ##################### ####################################################################### Function arguments and values: Working directory: C:/Users/tj248/OneDrive - University of Exeter/Exeter University/PhD Project Documents/Pink Sea Fan/NextRAD Project 2017/nextRAD SNP Data Analysis Input file: from global environment Strata: no Population levels: no Population labels: no Output format(s): tidy, genepop, bayescan Filename prefix: no Filters: Blacklist of individuals: no Blacklist of genotypes: no Whitelist of markers: no monomorphic.out: TRUE snp.ld: no common.markers: TRUE max.marker: no pop.select: no maf.thresholds: no

Imputations options: imputation.method: no

parallel.core: 3

#######################################################################

Importing data

Error in mutate_impl(.data, dots) : Evaluation error: object 'MARKERS' not found.

thierrygosselin commented 6 years ago

Hi Tom, it's very difficult to reproduce without the dataset... can you send me your genind object?

Best Thierry

Tom-Jenkins commented 6 years ago

Hi Thierry,

It's the code below using the file I sent you before:

vcf = read.vcfR("stacks_snpsaurus_seafan_relaxed.vcf") vcf

Convert vcf file to genind object

seafan = vcfR2genind(vcf)

Edit individual labels

indNames(seafan) = unlist(strsplit(indNames(seafan), split="_sorted.bam")) indNames(seafan)

Add population labels

seafan$pop = as.factor(substr(indNames(seafan), 1, 3)) seafan$pop

convert = genomic_converter(seafan, output=c("genepop","bayescan"))

####################################################################### ##################### radiator::genomic_converter ##################### ####################################################################### Function arguments and values: Working directory: C:/Users/tj248/Dropbox/R nextRAD Radiator Analysis Input file: from global environment Strata: strata_labels.tsv Population levels: no Population labels: no Output format(s): tidy, genepop, bayescan Filename prefix: no Filters: Blacklist of individuals: no Blacklist of genotypes: no Whitelist of markers: no monomorphic.out: TRUE snp.ld: no common.markers: TRUE max.marker: no pop.select: no maf.thresholds: no

Imputations options: imputation.method: no

parallel.core: 3

#######################################################################

Importing data

Error in mutate_impl(.data, dots) : Evaluation error: object 'MARKERS' not found.

Best, Tom


From: Thierry Gosselin notifications@github.com Sent: 24 January 2018 16:31 To: thierrygosselin/radiator Cc: Jenkins, Tom; Author Subject: Re: [thierrygosselin/radiator] Importing genind object using genomic_converter (#16)

Hi Tom, it's very difficult to reproduce without the dataset... can you send me your genind object?

Best Thierry

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/thierrygosselin/radiator/issues/16#issuecomment-360191947, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AUA6s409mrisGpKs2UgwYgYuiOXXjm6fks5tN1rOgaJpZM4Rpm9L.

thierrygosselin commented 6 years ago

Oh ok I didn't realize it was this one. On it.

thierrygosselin commented 6 years ago

Hi Tom,

Found the problem. Part of it is mine and is fixed in the latest commit, but there is more to it.

bug found I was not accounting correctly on how the genind object was made via vcfR::vcfR2genind.

I'm a big fan of vcfR, and I'm actually using it inside my functions to read the vcf files when the metadata are needed (e.g. GL, PL, read depth info, etc). However, I'm using pegas when no metadata are needed, because pegas is a lot faster than vcfR (but no metadata can be read...).

Other problem The way you are using vcfR to generate the genind object generates some downstream problems inside radiator.

The most obvious one I find is that the markers inside the genind object is building only with the ID column of the vcf file. This is problematic because you should see those as locus info and they have duplicates (when more than 1 SNP is found on the read). So those are not unique numbers and using this directly inside radiator generates genepop and bayescan files with 2547 markers instead of 6386.

What you want is to combine the CHROM + POS and ID info in the vcf file to generate the unique marker info. This is what I actually do if you were to use this command to generate your output files:

test <- radiator::genomic_converter(data = "stacks_snpsaurus_seafan_relaxed.vcf", strata = "strata_labels.txt", output = c("genind", "genepop", "bayescan"))

Not convince ?

Try this seafan@loc.fac and see what you get. Then try this test$genind.no.imputation@loc.fac.

FYI: The ID column in vcf files produced by stacks since v.1.44 have the LOCUS info with the position of the SNPs on the read/locus separated by _. The position info is the COL column in the sumstats file produced by the populations module.

Hope this help Thierry