Sex Assignment Through Coverage
Framework for joint determination of individual sex and sex-linked scaffolds for non-model organism based on depth of coverage (DoC). Our method assign individual sex by projecting sequencing depth of the samples to a two dimensional Principal Component Analysis plot followed by Gaussian mixture clustering. Jointly sex-linked scaffolds are identified based on differences in male and female read depth by a two-sample t-test.
SATC works following these steps:
SATC is just a set of R functions. You can use SATC in R, in an interactive shiny app or run it from the command line. The depth of coverage information can quickly be calculated from index bam files e.g.
samtools idxstats myFile.bam > myFile.idxstats
We have implemented a Graphical User Interface (GUI) version of SATC as a shiny app, where you can upload your data, interactively run the analyses and download the output:
You can find more information on how to run SATC in shiny inside the app.
## source the R functions
source("https://raw.githubusercontent.com/popgenDK/SATC/main/satcFunc.R")
## use the mclust library. Install if needed (install.package("mclust"))
library(mclust)
## read list of filenames
IDXFILE <- "https://raw.githubusercontent.com/popgenDK/SATC/main/examples/gitpathLeopard.txt"
filenames <- scan(IDXFILE,what="sUp")
## read idx files
idx <- lapply(filenames,read.table,as.is=T)
names(idx) <- basename(filenames)
## Filter scafoolds (min 100kb ) and normalize using the M longest scaffold
rFilt <- filterScaffold(dat=idx,minLength=1e5,M=5)
##plot normalized depth
plotDepth(rFilt) ##all scaffs
plotDepth(rFilt,normOnly=TRUE) ##scaffolds used for normalization (look for outliers)
## identify sex and sex scaffolds
sex <- sexDetermine(dat=rFilt, K=2, model="gaussian")
## plot the clustering
plotGroup(sex)
## plot the scaffolds depths stratificed by inferred sex including the (abnormal) scaffolds
plotScafs(sex,abnormal=T)
## plot the sex scaffolds' normalized depth for each individuals
plotSamples(sex)
## visialized the gausian mixtures
plotUnc(sex)
## See the inferred status of each scaffold
head(sex$SexScaffolds)
## See the inferred sex of each indiviual
head(cbind(names(idx),sex$sex))
git clone https://github.com/popgenDK/SATC
cd SATC/
The required R package (mclust) can be easily installed, we have already included an automatic install code in our main function if needed.
SATC is run by executing the main function file satc.R:
Rscript --vanilla satc.R -i <idxfiles list> -o <prefix for output files>
All arguments and options are:
Required arguments:
-i --input: Path to file with paths to idxstat files
-o --output: Prefix for output files
Optional arguments:
--K: Number of principal components used for clustering (default 2)
--model: Use gaussian clustering "gaussian" or hierarchical clustering "hclust" (default "gaussian")
--minLength: Minimum length of scaffolds to include, in bp (default 1e5)
--M: Number of scaffolds used for normalization (default 5)
--normScaffolds: Path to file with list of scaffolds to use for normalization; overwrites M (defualt NULL)
--useMedian: Use median depth of coverage of selected scaffolds for normalization instead of mean (default FALSE)
-h: Print help and exit
The main output of SATC are two tables:
Furthermore, SATC also produces two plots in png format:
And two lists:
and
Below is an example of performing SATC on our leopard data:
Rscript --vanilla satc.R -i examples/leo_idxstats_path.txt -o leo
By running SATC, it will automatically produced two plots (normalized depth plot, PCA and Sex-linked Scaffolds) in png format.
For example, here's a normalized depth plot of leopard dataset:
Also PCA plot and boxplot of Sex-linked Scaffolds:
SATC can also run fully in R (see example above in Running in R).
Please cite our papers:
Nursyifa, C., Brüniche-Olsen, A., Garcia-Erill, G., Heller, R., & Albrechtsen, A. (2021). Joint identification of sex and sex-linked scaffolds in non-model organisms using low depth sequencing data. Molecular Ecology Resources, 00, 1– 10. https://doi.org/10.1111/1755-0998.13491