Closed jessmundy closed 4 years ago
Hi Jess,
Thanks for your interest in HDL! I am guessing the error is because N (the number of individuals) is not available for some SNPs in your GWAS. I will try to fix it tomorrow and come back to you. In the meanwhile, you may check whether it is true that for some SNPs their "N" are NA.
Best, Zheng
Hi Zheng,
Thanks so much for getting back to me - I actually sorted a problem in the end! I was trying to use munged sumstats from the LDSC output (I thought they'd work since the columns were the same, but HDL wouldn't work and I kept getting those errors. Then thought i'd better do the HDL data wrangling method and then it worked perfectly!
Thanks so much for getting back to me!
All the best,
Jess
Hi Jess,
Great to know that it works now! I think your bug report is still valuable, which reveals some potential bug for more general input (like the LDSC munged data you used). I will fix it in the next update.
Thanks again for using HDL! I will close this thread. But do not be hesitated to reopen it or create a new one if you find any other problem.
Best, Zheng
Hi @zhenin ,
I actually ran into the exact same error as described above. However, after passing my sumstats through the data.wrangling.R
script, it stills produces the same error.
One thing I found a bit strange is that I'm getting different numbers of overlapping SNP counts from the log of HDL.data.wrangling.R
vs HDR.run.R
, even though I'm using the direct output from the wrangling script as input to run HDL.
Below is the log from running the data wrangling scripts for both sumstats, followed by the HDL estimation.
Do you have any ideas of what went wrong, or suggestions on what else I should look into?
Thanks in advance for your help!
Best, Hui
Loading GWAS summary statistics from GWAS1.sumstats.gz
Data are loaded successfully. Data wrangling starts.
Warning message:
`rename_()` is deprecated as of dplyr 0.7.0.
Please use `rename()` instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated.
Data wrangling completed.
1029876 out of 1029876 (100%) SNPs in reference panel are available in GWAS.
The output is saved to GWAS1.hdl.rds
Loading GWAS summary statistics from GWAS2.sumstats.gz
Data are loaded successfully. Data wrangling starts.
Warning message:
`rename_()` is deprecated as of dplyr 0.7.0.
Please use `rename()` instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated.
Data wrangling completed.
1029876 out of 1029876 (100%) SNPs in reference panel are available in GWAS.
The output is saved to GWAS2.hdl.rds
Function arguments:
gwas1.df=GWAS1.hdl.rds
gwas2.df=GWAS2.hdl.rds
LD.path=UKB_imputed_SVD_eigen99_extraction
output.file=rg.hdl.out
Loading GWAS1 ...
Loading GWAS2 ...
HDL: High-definition likelihood inference of genetic correlations and heritabilities (HDL)
Version 1.3.9 (2020-11-24) installed
Author: Zheng Ning, Xia Shen
Maintainer: Zheng Ning <zheng.ning@ki.se>
Tutorial: https://github.com/zhenin/HDL
Use citation("HDL") to know how to cite this work.
Analysis starts on Sat Feb 6 18:08:07 2021
655548 out of 1029876 (63.65%) SNPs in reference panel are available in GWAS 1.
831961 out of 1029876 (80.78%) SNPs in reference panel are available in GWAS 2.
Warning: More than 1% SNPs in reference panel are missed in GWAS 1. This may generate bias in estimation. Please make sure that you are using correct reference panel.
Warning: More than 1% SNPs in reference panel are missed in GWAS 2. This may generate bias in estimation. Please make sure that you are using correct reference panel.
Error in if (N0 > 0) h12.ols = c(summary(reg)$coef[1:2, 1:2] * c((N0/p1/p2), :
missing value where TRUE/FALSE needed
Calls: HDL.rg
Execution halted
Hi Hui,
Thanks for reporting the bug! The discrepancy is most likely because N (sample size) of some SNPs are missing. Currently, the SNPs with missing N are removed in HDR.run.R
(or the main function of HDL), but they are not removed in HDL.data.wrangling.R
.
I will fix this in the next update of HDL (within this week). I plan to add a parameter to let users decide how to deal with these SNPs with missing N: removing or filling with median or max N.
In the meanwhile, a recommendation is to check why the N are missing and fill them with reasonable values if possible. I guess this will solve the problem.
Thank you again for using HDL!
Best, Zheng
Hi @zhenin
Thanks a lot! Sorry I mis-interpreted the previous comments in this thread and thought that the data.wrangling.R
automatically removed the NA entries of the N column. After pre-processing the sumstats I don't have any problem with the estimation step.
Best, Hui
Hi @huilisabrina
Great! Please do not hesitate to come back to me if you find any other bugs.
Best, Zheng
Hi,
I encountered the same error:
I ran two steps:
HDL.data.wrangling.R
to convert the GWAS to HDL format (without errors) for the two traits
HDL.run.R
(the error below)
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Attaching package: ‘data.table’
The following objects are masked from ‘package:dplyr’:
between, first, last
Program starts on Thu Sep 28 14:01:25 2023
Loading GWAS summary statistics from ~/test/C1024_997_gwas_plink_ldsc.sumstats.gz
Data are loaded successfully. Data wrangling starts.
Warning message:
`rename_()` was deprecated in dplyr 0.7.0.
ℹ Please use `rename()` instead.
Data wrangling completed.
1029876 out of 1029876 (100%) SNPs in reference panel are available in GWAS.
The output is saved to ~/test/C1024_997.hdl.rds
The log is saved to ~/test/C1024_997.txt
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Attaching package: ‘data.table’
The following objects are masked from ‘package:dplyr’:
between, first, last
Program starts on Thu Sep 28 14:01:33 2023
Loading GWAS summary statistics from ~/test/NSAID.sumstats.gz
Data are loaded successfully. Data wrangling starts.
Warning message:
`rename_()` was deprecated in dplyr 0.7.0.
ℹ Please use `rename()` instead.
Data wrangling completed.
1029876 out of 1029876 (100%) SNPs in reference panel are available in GWAS.
The output is saved to ~/test/NSAID.hdl.rds
The log is saved to ~/test/NSAID.txt
Function arguments:
gwas1.df=~/test/C1024_997.hdl.rds
gwas2.df=~/test/NSAID.hdl.rds
LD.path=~/UKBB_HapMap3_LD_reference
output.file=~/test/C1024_997_vs_NSAID.Rout
Loading GWAS1 ...
Loading GWAS2 ...
HDL: High-definition likelihood inference of genetic correlations and heritabilities (HDL)
Version 1.4.0 (2021-04-15) installed
Author: Zheng Ning, Xia Shen
Maintainer: Zheng Ning <zheng.ning@ki.se>
Tutorial: https://github.com/zhenin/HDL
Use citation("HDL") to know how to cite this work.
Analysis starts on Thu Sep 28 14:01:46 2023
43698 SNPs were removed in GWAS 1 due to missing N or missing test statistic.
11889 SNPs were removed in GWAS 2 due to missing N or missing test statistic.
986178 out of 1029876 (95.76%) SNPs in reference panel are available in GWAS 1.
1017987 out of 1029876 (98.85%) SNPs in reference panel are available in GWAS 2.
Warning: More than 1% SNPs in reference panel are missed in GWAS 1. This may generate bias in estimation. Please make sure that you are using correct reference panel.
Warning: More than 1% SNPs in reference panel are missed in GWAS 2. This may generate bias in estimation. Please make sure that you are using correct reference panel.
Error in if (N0 > 0) h12.ols = c(summary(reg)$coef[1:2, 1:2] * c((N0/p1/p2), :
missing value where TRUE/FALSE needed
Calls: HDL.rg
Execution halted
I installed the package from the HDL.install.R
, so I assume that the NAN issue for the N column is no longer an issue, right?
I tried to set the fill.missing.N
with median and NULL (default). The same errors occurred.
Hi there,
I am having a bit of trouble when running HDL. When I try in the command line, I get this error:
HDL: High-definition likelihood inference of genetic correlations and heritabilities (HDL) Version 1.3.8 (2020-07-12) installed Author: Zheng Ning, Xia Shen Maintainer: Zheng Ning zheng.ning@ki.se
Tutorial: https://github.com/zhenin/HDL
Use citation("HDL") to know how to cite this work.
Analysis starts on Tue Aug 4 15:19:04 2020 1029737 out of 1029876 (99.99%) SNPs in reference panel are available in GWAS 1. 1028394 out of 1029876 (99.86%) SNPs in reference panel are available in GWAS 2. Error in if (N0 > 0) h12.ols = c(summary(reg)$coef[1:2, 1:2] * c((N0/p1/p2), : missing value where TRUE/FALSE needed Calls: HDL.rg Execution halted
I thought this meant that I had to include the N0 flag. However, when I included the N0 flag, I then got this error:
HDL: High-definition likelihood inference of genetic correlations and heritabilities (HDL) Version 1.3.8 (2020-07-12) installed Author: Zheng Ning, Xia Shen Maintainer: Zheng Ning zheng.ning@ki.se
Tutorial: https://github.com/zhenin/HDL
Use citation("HDL") to know how to cite this work.
Analysis starts on Tue Aug 4 14:45:57 2020 1029737 out of 1029876 (99.99%) SNPs in reference panel are available in GWAS 1. 1028394 out of 1029876 (99.86%) SNPs in reference panel are available in GWAS 2. Error in N0/N1 : non-numeric argument to binary operator Calls: HDL.rg Execution halted
I then attempted to run HDL in R. But I get this error:
Error in HDL.rg(gwas1.df, gwas2.df, LD.path) : could not find function "HDL.rg"
Any ideas what might be going wrong?
Thanks so much in advance!
Jess