whitlock / OutFLANK

A procedure to find Fst outliers based on an inferred distribution of neutral Fst
18 stars 9 forks source link

Modified MakeDiploidFSTMat() function #19

Open akijarl opened 5 years ago

akijarl commented 5 years ago

I noticed that the MakeDiploidFSTMat() function does not proceed in cases where individual alleles being considered are not represented by both homozygotes and a heterozygote in the population (i.e. there are not occurrences of 0, 1, and 2 in the snp matrix).

Modified the function by allowing it to proceed as long as any combination of 0, 1, 2, or 9 exists in the matrix, as long as it only contains entries from among those values:

MakeDiploidFSTMat<-function(SNPmat,locusNames,popNames){
    locusname <- unlist(locusNames)
    popname <- unlist(popNames)
    snplevs <- levels(as.factor(unlist(SNPmat)))
    if(any(!(snplevs%in%c(0,1,2,9)))==TRUE) {
      print("Error: Your snp matrix has a character other than 0,1,2 or 9")
      break
    }
    if (dim(SNPmat)[1] != length(popname)) {
      print("Error: your population names do not match your SNP matrix")
      break
    }
    if (dim(SNPmat)[2] != length(locusname)) {
      print("Error:  your locus names do not match your SNP matrix")
      break
    }
    writeLines("Calculating FSTs, may take a few minutes...")
    nloci <- length(locusname)
    FSTmat <- matrix(NA, nrow = nloci, ncol = 8)
    for (i in 1:nloci) {
      FSTmat[i, ] = unlist(getFSTs_diploids(popname, SNPmat[,i]))
      if (i%%10000 == 0) {
        print(paste(i, "done of", nloci))
      }
    }
    outTemp = as.data.frame(FSTmat)
    outTemp = cbind(locusname, outTemp)
    colnames(outTemp) = c("LocusName", "He", "FST", "T1", "T2", 
                          "FSTNoCorr", "T1NoCorr", "T2NoCorr", "meanAlleleFreq")
    return(outTemp)
  }

P.S. In order to run this modified function need to redefine the function getFSTs_diploids() in the global environment:

getFSTs_diploids = function(popNameList, SNPDataColumn){  
  #eliminating the missing data for this locus
  popnames=unlist(as.character(popNameList))
  popNameTemp=popnames[which(SNPDataColumn!=9)]
  snpDataTemp=SNPDataColumn[SNPDataColumn!=9]

  HetCounts <- tapply(snpDataTemp, list(popNameTemp,snpDataTemp), length)
  HetCounts[is.na(HetCounts)] = 0

  #Case: all individuals are genetically identical at this locus
  if(dim(HetCounts)[2]==1){
    return (list(He=NA,FST=NA, T1=NA, T2=NA,FSTNoCorr=NA, T1NoCorr=NA, T2NoCorr=NA,meanAlleleFreq = NA))
  }

  if(dim(HetCounts)[2]==2){
    if(paste(colnames(HetCounts),collapse="")=="01"){HetCounts=cbind(HetCounts,"2"=0)}
    if(paste(colnames(HetCounts),collapse="")=="12"){HetCounts=cbind("0"=0,HetCounts)} 
    if(paste(colnames(HetCounts),collapse="")=="02"){HetCounts=cbind(HetCounts[,1],"1"=0, HetCounts[,2])}
  }

  out = WC_FST_Diploids_2Alleles(HetCounts) 
  return(out)
}
whitlock commented 5 years ago

Hi Áke, Thanks very much for your suggestion. We’ll get this into the main code soon.

Thanks!

Mike

On May 29, 2019, at 7:10 AM, Áki Jarl Láruson notifications@github.com wrote:

I noticed that the MakeDiploidFSTMat() function does not proceed in cases where individual alleles being considered are not represented by both homozygotes and a heterozygote in the population (i.e. there are not occurrences of 0, 1, and 2 in the snp matrix).

Modified the function by allowing it to proceed as long as any combination of 0, 1, 2, or 9 exists in the matrix, as long as it only contains entries from among those values:

MakeDiploidFSTMat<-function(SNPmat,locusNames,popNames){ locusname <- unlist(locusNames) popname <- unlist(popNames) snplevs <- levels(as.factor(unlist(SNPmat))) if(any(!(snplevs%in%c(0,1,2,9)))==TRUE) { print("Error: Your snp matrix has a character other than 0,1,2 or 9") break } if (dim(SNPmat)[1] != length(popname)) { print("Error: your population names do not match your SNP matrix") break } if (dim(SNPmat)[2] != length(locusname)) { print("Error: your locus names do not match your SNP matrix") break } writeLines("Calculating FSTs, may take a few minutes...") nloci <- length(locusname) FSTmat <- matrix(NA, nrow = nloci, ncol = 8) for (i in 1:nloci) { FSTmat[i, ] = unlist(getFSTs_diploids(popname, SNPmat[,i])) if (i%%10000 == 0) { print(paste(i, "done of", nloci)) } } outTemp = as.data.frame(FSTmat) outTemp = cbind(locusname, outTemp) colnames(outTemp) = c("LocusName", "He", "FST", "T1", "T2", "FSTNoCorr", "T1NoCorr", "T2NoCorr", "meanAlleleFreq") return(outTemp) } P.S. In order to run this need to redefine the function getFSTs_diploids() in the global environment:

getFSTs_diploids = function(popNameList, SNPDataColumn){

eliminating the missing data for this locus

popnames=unlist(as.character(popNameList)) popNameTemp=popnames[which(SNPDataColumn!=9)] snpDataTemp=SNPDataColumn[SNPDataColumn!=9]

HetCounts <- tapply(snpDataTemp, list(popNameTemp,snpDataTemp), length) HetCounts[is.na(HetCounts)] = 0

Case: all individuals are genetically identical at this locus

if(dim(HetCounts)[2]==1){ return (list(He=NA,FST=NA, T1=NA, T2=NA,FSTNoCorr=NA, T1NoCorr=NA, T2NoCorr=NA,meanAlleleFreq = NA)) }

if(dim(HetCounts)[2]==2){ if(paste(colnames(HetCounts),collapse="")=="01"){HetCounts=cbind(HetCounts,"2"=0)} if(paste(colnames(HetCounts),collapse="")=="12"){HetCounts=cbind("0"=0,HetCounts)} if(paste(colnames(HetCounts),collapse="")=="02"){HetCounts=cbind(HetCounts[,1],"1"=0, HetCounts[,2])} }

out = WC_FST_Diploids_2Alleles(HetCounts)
return(out) } — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/whitlock/OutFLANK/issues/19?email_source=notifications&email_token=AB23ZQ6BFHABVHKHT6RSL2DPX2FHDA5CNFSM4HQNGX52YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GWP43PA, or mute the thread https://github.com/notifications/unsubscribe-auth/AB23ZQ7BZK7RXKPTHBIZ3FDPX2FHDANCNFSM4HQNGX5Q.