BayesR: Some questions about computation time of bayesR.

jiefangDuan commented 2 years ago

Thank you very much for developing this software. But now I have some doubts about the BayesR. It's mostly about computation time. There are 202840 SNPs and 8438 subjects. I include three covariates. In order to reduce the computation burden, according to the BayesRMannual published in Github (https://github.com/syntheke/bayesR), I used the “500SNPs” strategy by modifying the two options(--msize -mrep). With reference to your manual's experience about time and memory, I would guess that my data would take less than six hours. But actually I run the first step about 120 hours, the first step didn’t complete. I really can't find the reason, so please give me some guidance and suggestions. Thank you very much! This is the command I used:

train a model

./bayesRv2 -bfile traindata1 -out traindata1 \ -numit 50000 -burnin 20000 -seed 333 \ -blocksize 4 -nthreads 4 \ -msize 500 -mrep 5000 \ -covar cov_train1.txt

prediction

./bayesRv2 -bfile testdata1 -out testdata1 \ -predict -model traindata1.model -freq traindata1.frq \ -param traindata1.param -covar cov_test1.txt \ -alpha traindata1.alpha

Kind Regards,

Jiefang

syntheke commented 2 years ago

Hi Jiefang,

looking at your command list, how did you perform "prediction" when fitting the model failed ( "first step" didn't complete")?

First you may confirm that the software is running as expected on your system using the examples provided.

Without access to the data its hard to provide guidance, I would try the following

exclude covariates
omit msize mrep
reduce #SNPs

Cheers

jiefangDuan commented 2 years ago

Dear, I am very grateful for your prompt reply and appreciate your valuable suggestions. About the data I use is UKBiobank that is imputed with mean . About the command line I provided to you, is all my commands. I sent it to you for completeness. Indeed, my fitting model failed, so the prediction step was never performed. As for the suggestion you mentioned, before I sent you my initial email, I have tried the operation of excluding covariates, which also took a long time and was not completed. Of course, the premise of these operations is that I used the "500SNPs" strategy. And I never try to omit the options of msize and mrep because of the size of my data. Next,I will try it. In terms of reducing the number of SNPs, it is true that it can work with the case that includes 100,000 SNPs (the number of SNPs have halved ) and doesn't include covariates. But reducing the number of SNPs may be a last resort for me. Next, I'll try the suggestion you mentioned and see the performance! I don't have much experience in bioinformatics, so please feel free to speak up if you have reminders. Thank you very much! Best wishes for you!

Kind Regards, Jiefang syntheke @.***> 于2021年12月2日周四 21:11写道：

Closed #15 https://github.com/syntheke/bayesR/issues/15.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/syntheke/bayesR/issues/15#event-5705733895, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATFZBC6VYTPLYQ3K35TBEATUO5V7BANCNFSM5JGVFNYA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

syntheke commented 2 years ago

Hi Jiefang,

I'm not familiar with the Biobank data, but bayesR only accepts 0,1,2 as genotypes, missing genotypes (NA) are allowed.

Your data should run within a day, without using the reduced update.

The *.hyp file provides current estimates of the parameters during the run of the program and can be used to monitor the computations. You could set -burnin to zero and -numit to a low number first and check the the &.hyp file.

Cheers

jiefangDuan commented 2 years ago

Hi syntheke, I'm sorry to bother you again. According to your advice, I have tried the idea that setting -burnin to 0 and -numit to 500. It works and only takes 8 minutes. the defalut of -nummit and -burnin can't finish within a day for my data. Now, I've learned about the knowledge of MCMC, and I still don't know how to set the parameters of -numit and -burnin for my data properly. All I know is that the computing time should increase linearly with the number of iterations. Can you give me some guidance on how I can determine the suitable parameter settings for my data about the options of -numit and -burnin? I will appreciate your advice! Looking forward to hearing from you!

Kind regards, Jiefang

syntheke commented 2 years ago

Hi Jiefang,

did you look at the output files generated for the run with "-numit 500"?

jiefangDuan commented 2 years ago

Hi syntheke, Sorry to reply to you late. Indeed, I check the output files generated for the run with "-numit 500". But I don't know how many I should set to -numit and -burnin is appropriate for my application based on the output files. Please forgive me for my limited English expression ability. The following are the output files and their corresponding details. [image: aaaaa.png] [image: bbb.png]

traindata1.hyp [image: 11111.png] [image: 22222.png]
traindata1.log [image: trainlof.png]
traindata1.model [image: model.png]
traindata1.param
testdata1.log [image: testlog.png]

syntheke @.***> 于2021年12月6日周一 10:05写道：

Hi Jiefang,

did you look at the output files generated for the run with "-numit 500"?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/syntheke/bayesR/issues/15#issuecomment-986372241, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATFZBCYVA23J3BIEW2DVANLUPQK7HANCNFSM5JGVFNYA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

syntheke commented 2 years ago

Hi Jiefang,

I seems that your files were not uploaded correctly, I can only see the name of the files but no content. Before continuing the conversation you need to

A. Test that the software runs on your system as expected.
Did you run the examples that are provided with the software and do results agree with those in the example folders?

B. Make sure that your data is formatted as required by bayesR Running only a few hunded iterations and looking at the output files can sometimes highlight issues with the input data or model specifications. Obviously, if you're not familiar with bayesR you don't always know what to look for. However, you could try to run bayesR using a subset of randomly selected SNPs (as suggested earlier), for example 50k, and compare the derived model with your expectations or results from previous analysis (e.g. GREML).

Cheers

syntheke / bayesR

BayesR: Some questions about computation time of bayesR. #15

train a model

prediction