zhenin / HDL

High-definition likelihood inference of genetic correlations (HDL)
96 stars 28 forks source link

“Error in readChar(con, 5L, useBytes = TRUE) : cannot open the connection” when using the UKB_imputed_hapmap2_SVD_eigen99_extraction panel #26

Closed Estersf closed 3 years ago

Estersf commented 3 years ago

Hi,

I am trying to calculate the heritability from my own summary statistics data. Using the UKB_imputed_SVD_eigen99_extraction panel I get a warning message “Warning: More than 1% SNPs in reference panel are missed in the GWAS. This may generate bias in estimation. Please make sure that you are using correct reference panel”, so I switched to the UKB_imputed_hapmap2_SVD_eigen99_extraction panel, which does not give me any warning message in the “HDL.data.wrangling.R” step:

Rscript /home/ester/Downloads/HDL/HDL.data.wrangling.R \

gwas.file=/home/ester/Documents/Ester/SCZ/METAL/STDERR_OFF/META_chrs_bp_N_rsID_good.txt \ LD.path=/media/ester/Elements/SCZ_AAO/HDL/UKB_imputed_hapmap2_SVD_eigen99_extraction \ SNP=SNP A1=Allele1 A2=Allele2 N=N b=Effect se=SE \ output.file=/media/ester/Elements/SCZ_AAO/HDL/Outputs/test \ log.file=/media/ester/Elements/SCZ_AAO/HDL/Outputs/test

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

filter, lag

The following objects are masked from ‘package:base’:

intersect, setdiff, setequal, union

Attaching package: ‘data.table’

The following objects are masked from ‘package:dplyr’:

between, first, last

Program starts on Thu Jul 1 16:06:29 2021 Loading GWAS summary statistics from /home/ester/Documents/Ester/SCZ/METAL/STDERR_OFF/META_chrs_bp_N_rsIDgood.txt Data are loaded successfully. Data wrangling starts. Warning message: `rename()was deprecated in dplyr 0.7.0. Please userename()instead. This warning is displayed once every 8 hours. Calllifecycle::last_warnings()` to see where this warning was generated. Data wrangling completed. 763617 out of 769306 (99.26%) SNPs in reference panel are available in GWAS.
The output is saved to /media/ester/Elements/SCZ_AAO/HDL/Outputs/test.hdl.rds The log is saved to /media/ester/Elements/SCZ_AAO/HDL/Outputs/test.txt

However, I get an error in the “HDL.run.R” step, which I am not able to solve:

Rscript /home/ester/Downloads/HDL/HDL.run.R \

gwas.df=/media/ester/Elements/SCZ_AAO/HDL/Outputs/test.hdl.rds \ LD.path=/media/ester/Elements/SCZ_AAO/HDL/UKB_imputed_hapmap2_SVD_eigen99_extraction \ output.file=/media/ester/Elements/SCZ_AAO/HDL/Outputs/gwas_UKB_imputed_hapmap2.Rout Function arguments: gwas.df=/media/ester/Elements/SCZ_AAO/HDL/Outputs/test.hdl.rds LD.path=/media/ester/Elements/SCZ_AAO/HDL/UKB_imputed_hapmap2_SVD_eigen99_extraction output.file=/media/ester/Elements/SCZ_AAO/HDL/Outputs/gwas_UKB_imputed_hapmap2.Rout

Loading GWAS ...

HDL: High-definition likelihood inference of genetic correlations and heritabilities (HDL) Version 1.4.0 (2021-04-15) installed Author: Zheng Ning, Xia Shen Maintainer: Zheng Ning zheng.ning@ki.se

Tutorial: https://github.com/zhenin/HDL

Use citation("HDL") to know how to cite this work.

Analysis starts on Thu Jul 1 16:06:48 2021 749497 out of 769306 (97.43%) SNPs in reference panel are available in the GWAS.
Warning: More than 1% SNPs in reference panel are missed in the GWAS. This may generate bias in estimation. Please make sure that you are using correct reference panel.
Error in readChar(con, 5L, useBytes = TRUE) : cannot open the connection Calls: HDL.h2 -> load -> readChar In addition: Warning message: In readChar(con, 5L, useBytes = TRUE) : cannot open file '/media/ester/Elements/SCZ_AAO/HDL/UKB_imputed_hapmap2_SVD_eigen99_extraction/': it is a directory Execution halted

Could you suggest me some advice on how to solve this error, please? I am new to use this tool and I also do not understand why, in the second step, more than 1% SNPs in reference panel are missed in the GWAS, as I am using the same panel in both steps.

I look forward to your reply and thank you for this tool.

Best regards,

Ester

now2014 commented 3 years ago

Hi Estersf,

Thanks for your bug report! This error is due to the mismatch of filenames in the LD reference panel; I've fixed it. You may need to download the updated code and re-install it.

As for the warning message, are you using two different GWAS summarise? The missing rate is related to the proportion of overlapped SNPs between GWAS summary statistics and the reference panel.

Best regards, now2014

Estersf commented 3 years ago

Hi, Thank you for your quick response. I have downloaded the updated code and I get this error on re-installation:

image

No, I am using the same GWAS summarise.

Thank you Ester

now2014 commented 3 years ago

Sorry, I left a typo in the code. You can retry it now.

Estersf commented 3 years ago

Now it works fine. Thank you very much. However, I have another question that I would like to share with you, in case you could help me with any suggestions or advice, please.

Although I don’t get any warning message in the “HDL.data.wrangling.R” step:

Program starts on Mon Jul 5 13:06:57 2021 Loading GWAS summary statistics from /home/ester/Documents/Ester/SCZ/METAL/STDERR_OFF/META_chrs_bp_N_rsID_good.txt Data are loaded successfully. Data wrangling starts. Data wrangling completed. 763617 out of 769306 (99.26%) SNPs in reference panel are available in GWAS.
The output is saved to /media/ester/Elements/SCZ_AAO/HDL/Outputs/test.hdl.rds The log is saved to /media/ester/Elements/SCZ_AAO/HDL/Outputs/test.txt

In the “HDL.run.R” step, the proportion of overlapped SNPs between GWAS summary statistics and the reference panel has changed:

Function arguments: gwas.df=/media/ester/Elements/SCZ_AAO/HDL/Outputs/test.hdl.rds LD.path=/media/ester/Elements/SCZ_AAO/HDL/UKB_imputed_hapmap2_SVD_eigen99_extraction output.file=/media/ester/Elements/SCZ_AAO/HDL/Outputs/gwas_UKB_imputed_hapmap2.Rout

HDL: High-definition likelihood inference of genetic correlations and heritabilities (HDL) Version 1.4.0 (2021-04-15) installed Author: Zheng Ning, Xia Shen Maintainer: Zheng Ning zheng.ning@ki.se Tutorial: https://github.com/zhenin/HDL Use citation("HDL") to know how to cite this work.

Analysis starts on Mon Jul 5 13:07:23 2021 749497 out of 769306 (97.43%) SNPs in reference panel are available in the GWAS.
Warning: More than 1% SNPs in reference panel are missed in the GWAS. This may generate bias in estimation. Please make sure that you are using correct reference panel.
Integrating piecewise results Continuing computing standard error with jackknife Heritability of phenotype 1: 0.00e+00 (0.00e+00) P: NA Warning: Heritability of the trait was estimated to be 0, which may due to: 1) The true heritability is very low; 2) The sample size of the GWAS is too small; 3) Many SNPs in the chosen reference panel are absent in the GWAS; 4) There is a severe mismatch between the GWAS population and the population for computing reference panel Analysis finished at Mon Jul 5 13:11:08 2021 The results were saved to /media/ester/Elements/SCZ_AAO/HDL/Outputs/gwas_UKB_imputed_hapmap2.Rout

I don’t see the point of this loss of overlapped SNPs. Do you have any suggestions on this issue? Thank you! Ester

Estersf commented 3 years ago

Also, as a complementary doubt/curiosity, I noticed that if for the same GWAS, I use different panels in the "HDL.data.wrangling.R" step and in the "HDL.run.R" step, the resulting heritability is higher (h2=0.21) than using the same panel in both steps (h2=0). However, the percentage of overlapping SNPs is much lower (72.78%) than using the same panel in both steps (97.43%).

image

image

Do you have an explanation for this increase in heritability? Any ideas/comments would be appreciated.

Thank you for your time

Ester

now2014 commented 3 years ago
  1. In the HDL.run.R step, a parameter called fill.missing.N with default value = NULL, which means HDL will filter out variants whose Z-scores are NAs. But these variants are not filtered in the HDL.data.wrangling.R step. So you got different percentages of overlapped variants with the same LD reference panel.

  2. You shouldn't use different reference panels in HDL.data.wrangling.R and HDL.run.R steps; because you'll filter the variants TWICE. The HDL.data.wrangling.R not only formats the GWAS result into designated columns but also removes variants absent in the LD reference panel, which is also performed in the HDL.run.R step.

  3. When the missing rate is high, you may try a lower eigen.cut. Meanwhile, HDL is not recommended for heritability estimation.
    (#14)