Closed caimiao0714 closed 3 years ago
Could you share this list with me using dput()
?
Sure. Please refer to the code below.
list(c("1_47684677_T_G", "1_49589847_A_G", "1_50559820_C_T",
"1_72751185_T_C", "1_75002193_G_A", "1_78446761_G_A", "1_96924097_C_T",
"1_110154688_T_C", "1_177889480_A_G", "1_201784287_A_C"), c("2_632348_A_G",
"2_25150296_A_G", "2_26928811_G_A", "2_59305625_T_C", "2_63053048_G_A",
"2_143043285_C_T", "2_164567689_T_C", "2_181550962_C_T", "2_208255518_A_G",
"2_213413231_G_A", "2_219349752_C_T", "2_227092802_A_G"), c("3_25106437_A_G",
"3_61236462_C_T", "3_81792112_C_A", "3_85807590_T_G", "3_141275436_G_T",
"3_185824004_T_C"), c("4_45182527_A_G", "4_77129568_C_G", "4_103188709_C_T",
"4_145659064_T_C"), c("5_75015242_T_G", "5_153537893_G_T"), c("6_34563164_A_G",
"6_40348653_A_G", "6_50845490_A_G", "6_108977663_T_C", "6_120185665_C_T",
"6_137675541_A_G", "6_163033350_A_G"), c("7_75163169_A_G", "7_76608143_C_T",
"7_93197732_G_C", "7_95169514_G_T"), c("8_76806584_T_C", "8_81375457_C_T",
"8_85079709_T_C"), c("9_15634326_T_C", "9_28414339_A_G", "9_111932342_C_T",
"9_120378483_T_C", "9_129460914_A_G"), c("10_87410904_A_G", "10_102395440_T_C",
"10_104869038_T_C", "10_114758349_C_T"), c("11_8673939_C_G",
"11_27684517_A_G", "11_43864278_T_C", "11_47650993_C_T", "11_115022404_A_G"
), c("12_50247468_G_A", "12_122781897_G_A"), c("13_54102206_G_A",
"13_66205704_A_G", "13_79580919_G_A"), c("14_25928179_C_A", "14_29736838_C_A",
"14_30515112_C_T", "14_79899454_C_T"), c("15_51748610_A_G", "15_68077168_T_C",
"15_73093991_C_T"), c("16_3627358_C_T", "16_19935389_G_A", "16_28333411_G_A",
"16_28889486_C_A", "16_30015337_A_G", "16_31129895_A_G", "16_49062590_C_A",
"16_53803574_T_A"), c("17_2005136_C_G", "17_5283252_A_G", "17_78615571_G_A"
), c("18_21104888_C_T", "18_40147671_G_A", "18_56883319_T_G",
"18_57829135_T_C"), c("19_18454825_A_G", "19_34309532_A_G", "19_45395619_A_G",
"19_46202172_C_T", "19_47569003_G_A"), "20_51087862_C_T", "21_40291740_T_C",
character(0))
Thanks, I am able to run the following code without any issue:
Ok. Thank you! What is this tempfile()
on the 5th line? In addition, you put the .bgen.bgi
files in the .bgen
folder right? I'm not sure if that makes a difference, but I'll try it. I also find out that I'm using bigsnpr 1.6.1
, so I'll try to update the package and see if it makes a difference.
tempfile()
just generates some file name to be placed in some temporary directory.bgenfiles
and list_snp_id
)Thank you!
I've run the snp_readBGEN
for each chromosome one-by-one, and the issue seems to be in c10 and c11. And I checked the MD5 sum again and found that they are NOT consistent with the official website again.
Oh my lord, what's going on with my server??? I have another copy with the correct MD5 sum in another folder, and I copied them to this current folder, but these copied files have the wrong MD5 sum. Is it common to have this type of error when copying files from folders to folders? I'll close this issue soon.
Thanks, Miao
I don't know if it is common. So the issue is with "corrupted" files?
Ok. Yes, the issue is because the files were corrupted.
This is the fourth issue about this here. I'm starting to consider storing these md5sums in the package and checking these when the filename is UKBB.
Thank you! I agree. I think checking md5sums is important. I have an original copy of .bgen
files downloaded from UKBB with the correct md5sums. However, I've met the same issue at least 5 times when I used copied files in another folder/project using different servers. These .bgen
files seem to be especially vulnerable to copy and paste, even on the same hard drive.
Ok, I'll reopen this to remember to do it.
Ok. Thank you!
Is your problem because the file was not copied entirely? Then I think I could detect that in another way.
In my case, the copied files seem to have the same size and modified date time as the original files.
So, same size, but not the same data?
Yes. At least the numbers were the same at GB level. I've deleted those corrupted files, so I can't verify if the file sizes were exactly the same, but I still remember both of the two bgen files were 113 GBs for chr10.
I'm trying to extract only a few SNPs associated with BMI (the SNP IDs from this Nature paper by Locke et al. (2015)) from the UKBB data. However, when I try to read these SNPs using
bigsnpr::snp_readBGEN
, I encounter this error message:Below is my code and
Here is the 96 SNPs specified by Locke et al. (2015)
Based on my visual inspection, all the SNPs do seem to have 2 alleles. I also checked the MD5 sums of all my
.bgen
and.bgen.bgi
files and they are all correct.Any suggestions/comments would be appreciated.
Thanks, Miao