zhilizheng / SBayesRC

GNU General Public License v3.0
25 stars 5 forks source link

Reading SnpEffects.mcmcsamples.bin #31

Closed avnostaeva closed 4 months ago

avnostaeva commented 5 months ago

Dear Zhili,

Thanks for your work!

I want to try a recently proposed method (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8758557/) for calculating polygenic risks based on the use of posterior samples of the genetic effects. If I understand correctly, the values ​​I need are in the file SnpEffects.mcmcsamples.bin. I use the gctb tool with --sbayes R option.

1) I would like to clarify what information is contained in this file? 2) How to read this file to extract it? I tried the option --mcmc-samples, but it doesn't work. 3) In the file *snpRes, how are the values ​​in column SE calculated?

Thank you in advance! If I have contacted the wrong address, please write where it is better to contact.

Best, Arina

zhilizheng commented 5 months ago

Hi @avnostaeva ,

If you are using the version on this site, you can set the "SBayesRC::sbayesrc(..., bOutBeta=TRUE)", it will output the effect size for each MCMC sampling to ${output}.beta${iterations_after_burnin}.txt, the first two column (SNP and A1) were ommited to save disk space, you can retrieve those from the final SNP weights (${output}.txt file). Note that this is the effect in standarised scale (scaled the genotype to have variance 1 and mean 0, not like the output SNP weights in normal genotype scale), you will need to convert back to normal genotype scale (0, 1, 2) or calculate the PRS based on the same scale.

Hope this helps.

Regards, Zhili

avnostaeva commented 5 months ago

Hello!

Thank you! Especially for the clarification about scaling.

Do I understand correctly that in all implementations of the algorithm (gctb --sbayes R, gctb --sbayes RC and SBayesRC::sbayesrc), input summary statistics can be unscaled? Am I right that scaling is done from the SE and N values (from sum.stat) ​​at the beginning of the algorithm (marginal beta multiplying by sqrt(1/(N*SE^2))), and at the end of the algorithm the SNP weights are returned to the normal scale by dividing by the same scale-number?

Best, Arina

zhilizheng commented 5 months ago

Hi @avnostaeva ,

You are right. This was exactly what we did in the code and described in our paper.

Regards, Zhili

harryyiheyang commented 5 months ago

Hi @zhilizheng

I am currently working on a project where I need to obtain the standard error (SE) of BETA for some specific purposes as the PIP alone is not sufficient for my needs.

I have tried setting bOutBeta=T and bOutDetail=T, but I encountered an issue with the file size. The output file size for each MCMC iteration is approximately 91MB, resulting in a total size of about 300GB for 3000 iterations. This size is too large for practical use.

Could you please advise if there is a more efficient way to obtain the SE of BETA during the MCMC process? Any guidance or suggestions you could provide would be greatly appreciated.

Thank you for your time and for providing such an excellent tool.

zhilizheng commented 5 months ago

@harryyiheyang,

Oh, that's horrible. I will add the feature in next week if I can manage the time (SE of beta).

I will also create a new way to save the BETA in a very efficient way, that can also directly used in generating the PRS. That may be useful for @avnostaeva

Regards, Zhili

zhilizheng commented 5 months ago

We only save the 2000 iterations after the burn-in. So the storage would be 200GB or so. I will cut this to 60GB, and also flag to specify the thin flag, if in a step of 5, then the storage would be 12GB.

@harryyiheyang If you are in a hurry, you can change the chain length to a smaller value, e.g., niter=2000. However, it will decrease the prediction accuracy for some traits by 2%.

Regards, Zhili

harryyiheyang commented 5 months ago

@zhilizheng Thank you! I would try your suggestion using niter and thin flag = 5. It will be great that you could manage a time to add the feature (SE of beta).

zhilizheng commented 4 months ago

Hi @harryyiheyang ,

Sorry for the delay. I have a busy two week. I'm updating this weekends. Hope I can finish it in the early next week.

Regards, Zhili

zhilizheng commented 4 months ago

@harryyiheyang @avnostaeva ,

The test version 0.2.6.1 is in the repository (will become v0.2.6 after the test). This verison will save out all the beta during MCMC to a compressed sparse matrix (only takes extra 100MB in total). You can first reinstall SBayesRC by the following command, and rerun the SBayesRC without any extra flags.

devtools::install_github("zhilizheng/SBayesRC")

It will have the BETA and SE in the output.txt (without any flags, this is the default)

Extract the SNP weight in first iteration by:

SBayesRC::extractMCMCEff("YOURfolder/YOURprefix", 1,  "weight_iter1.txt")

Format: SNP A1 BETA(raw scale) Then you can use it in the SBayesRC::prs to obtain the risk score for new samples. Change the "1" to the number you're intrested, or loop from 1-200 to obtain the uncertainty. We only storaged the weight for [(niter - burn) / 10] iterations, that is 200 beta by default. The BETA will scale back to the raw genotype scale (specify flag scale=FALSE, if you'd like the standardized scale). We did not output all the BETA in a file, because the text file will be very large, so it's better to output one by one (can be removed by user after the processing).

Let me know if you find any bugs, or other comments.

I will have other tests to make sure no error.

Regards, Zhili

zhilizheng commented 4 months ago

@avnostaeva @harryyiheyang ,

Fixed in the version 0.2.6.

Regards, Zhili

harryyiheyang commented 4 months ago

Dear @zhilizheng ,

I am writing to express my sincere gratitude for your continuous patience and support in addressing the various questions and challenges I have encountered. Your willingness to assist has been invaluable to my work.

Thank you once again for your unwavering assistance and for creating such a remarkable piece of software.

Best regards, Yihe