xiaolei-lab / rMVP

:postbox: A Memory-efficient, Visualization-enhanced, and Parallel-accelerated Tool For Genome-Wide Association Study
Apache License 2.0
277 stars 71 forks source link

using multiple cores #78

Closed MuBulut closed 1 year ago

MuBulut commented 2 years ago

Hi

I have question concerning the ncpus option: Whenever I start my R-script (on a multi core) machine, it seems, that there is never more than 1 cpu utilized, The load is always around 1 with one thread at 100%.

I tried to specify ncpus=10 and also commenting out the option (which would mean all cores found should be used, right?), but could not see any change in behaviour.

Could you please comment on how to make sure (and check), that more than one cpu is used by rMVP?

hyacz commented 2 years ago

I usually check that the scripts are properly parallelized via a monitoring tool (e.g. top). can you show me the OS and R version, and the script?

MuBulut commented 2 years ago

Top is exactly the tool I used and realized, the load is never more than 1 and only 1 thread seems to be running. OS: Ubuntu Linux 20.04 R: 4.1.2

the script:

cleaned rMVP

INSTALLATION

library("rMVP") library("bigmemory")

setwd("/data/adonut/testdata")

options(bitmapType="cairo")

Files must called:

Phenotype

DATA PREPARATION

MVP.Data(fileVCF = "SNP.vcf", sep.vcf="\t", sep.phe="\t", filePhe = "all_traits_phenotype_log_rMVP.txt", out="mvp")

KINSHIP

Calculate from genofile

MVP.Data.Kin(TRUE, mvp_prefix = 'mvp', out = 'mvp', priority = "speed", sep = "\t")

Principal components

Calculate from genofile

MVP.Data.PC(TRUE, mvp_prefix = 'mvp', pcs.keep = 5, sep = "\t", priority = "speed" ) #play around with settings

genotype <- attach.big.matrix('mvp.geno.desc') phenotype <- read.table("mvp.phe", header = TRUE) #at which step should I prepare this??? kinship <- attach.big.matrix('mvp.kin.desc') map <- read.table("mvp.geno.map" , head = TRUE)

don't use covariates if you use GLM, MLM or FarmCPU

Covariates_PC <- bigmemory::as.matrix(attach.big.matrix('mvp.pc.desc'))

if you want to add additional covariates (like breed, replicate, environment etc.)

Covariates <- model.matrix(~as.factor(breed)+as.factor(sex)+as.numeric(weight), data=yourdata)

when PC should be added to covariates

Covariates <- cbind(Covariates, Covariates_PC)

STARTING GWAS

For multiple phenotypes

for(i in 2:ncol(phenotype)){ imMVP <- MVP( phe = phenotype[, c(1, i)], #phenotype data geno = genotype, #genotype data K = kinship, map = map, #mapping data --> i dont have it since i use vcf-files

CV.GLM=Covariates,

CV.MLM=Covariates,

CV.FarmCPU=Covariates,

nPC.GLM=5,

nPC.MLM=3, nPC.FarmCPU=3, priority="speed", ncpus=50, vc.method="HE", #"BRENT", "EMMA", and "GEMMA" maxLoop=10, #maximum iterations allowed in FarmCPU method.bin="FaST-LMM", #"FaST-LMM","EMMA", "static"

permutation.threshold=TRUE,#threshold of permutation will be used in manhattan plot

#95% quantile value of this vector is recommended to be used as significant threshold

permutation.rep=100,

threshold=c(0.5, 0.05), #0.05/marker size, a cutoff line on manhattan plot

how can I change it to effective size of independent marker???

method=c("MLM", "GLM", "FarmCPU") #are there more? --> no )

gc() }

hyacz commented 2 years ago

There is a line like this in MVP's log output

Number of threads used: <threads>

Can you check the number here?

I guess it may be that the OpenMP environment is abnormal or the use of threads is restricted by the system level