thibautjombart / apex

Phylogenetic Methods for Multiple Gene Data
5 stars 3 forks source link

Trouble creating geneind object using multidna2genind #51

Open gustavo-miranda opened 4 years ago

gustavo-miranda commented 4 years ago

Hello,

I am having trouble creating a genind object with the correct number of genes. Here's how I'm proceeding and more details on the problem:

I'm reading files in R with the function ‘read.multiFASTA’ which gives me a ‘multidna’ file with all my sequences (230 UCE sequences, i.e. 230 loci, for 54 individuals). montanus <- read.multiFASTA(files, add.gaps = F)

Then I use the function multidna2genind to create the geneind of my ‘multidna’ file montanus.gid <- multidna2genind(montanus, mlst = F, gapIsNA = T)

But when I check the ‘montanus.gid’ object this is what I get:

/// GENIND OBJECT /////////

 // 54 individuals; 6,155 loci; 12,922 alleles; size: 5.8 Mb

 // Basic content
   @tab:  54 x 12922 matrix of allele counts
   @loc.n.all: number of alleles per locus (range: 2-4)
   @loc.fac: locus factor for the 12922 columns of @tab
   @all.names: list of allele names for each locus
   @ploidy: ploidy of each individual  (range: 1-1)
   @type:  codom
   @call: DNAbin2genind(x = concatenate(x, genes = genes))

 // Optional content
   - empty -

It is finding 6155 loci instead of the original 230 that I have input. I think multidna2genind is considering each sequence as one different gene because I have 230(loci)*54(individuals)=12420; if I take this value and divide by two (assuming each gene has two alleles) I get: 12420/2 = 6210. There’s still a difference of 55 loci from what I get from my calculations and what the function finds, which I think might be the difference in individuals without sequences.

So, my question is: how do I make ‘multidna2genind’ understand that there are only 230 genes and not 6155?

I tried creating a list of genes genes <- list(montanus@dna), but it didn't work.

Thanks for the help. Gustavo