omerwe / polyfun

PolyFun (POLYgenic FUNctionally-informed fine-mapping)
MIT License
86 stars 21 forks source link

padding zero for chromosome #42

Closed jerome-f closed 3 years ago

jerome-f commented 3 years ago

Hi Omer,

finemapper.py pads a 0 in front of chromosome number 1-9, this creates a snp mismatch between the bcor file and the input passed from finemapper.py script during the execution of FINEMAP. There should likely be a note in the Wiki about it. If the .bcor file is prepared independent of finemapper.py then the chromosome position can be represented as 1 or 01.

if df_z['CHR'].iloc[0]<10: df_z['CHR'] = '0' + df_z['CHR'].astype(str)
omerwe commented 3 years ago

Hi,

I specifically made finemapper.py append a zero because that's what LDStore does, and I wanted to make them consistent with each other. Can you please describe how you created a bcor file that doesn't have a preceding zero?

jerome-f commented 3 years ago

Hi Omer,

I created the .bcor files independent of the finemapper.py. When I created the .bcor using ldstore the .z file listing all the snps had chromosome 1-9 as single characters. I am not sure if specifying chromosome as 01 instead of 1 is a convention (I am fairly new to this)

Best Jerome

omerwe commented 3 years ago

Hi Jerome,

This is actually not a convention --- it's just a property of FINEMAP+LDStore. FINEMAP doesn't work if chromosome numbers are specified without a trailing zero. Can you please send me the exact set of commands you used to generate a .bcor file + apply FINEMAP to run into this error? If would be best if you can use the plink files in the PolyFun example files so that I can reproduce this exactly.

Thanks!

Omer

jerome-f commented 3 years ago

Hi Omer,

Sorry for the late replies, I am not able to share my genotype files due to restrictions. the command I used is

ldstore_v2.0_x86_64 --in-files chr2_166000001_169000001.dat --write-bcor --read-bdose --n-threads 4 --memory 24

The .dat file has all the following information

z;bgen;bgi;sample;bcor;bdose;ld;n_samples
chr2_166000001_169000001.z;xxx.bgen;xxx.bgen.bgi;xxx.sample;chr2_166000001_169000001.bcor;xxx.bdose;xxx.ld;332423

the chr2_166000001_169000001.z has the following information

rs26 2 168743403 T A
rs29 2 168743545 C A
rs30 2 168743603 C A
rs31 2 168743611 G T
rs32 2 168743662 T C

here the second column is just one character, and it gets stored in the .bcor file as such. I can confirm that if I change the chr2_166000001_169000001.z as

rs26 02 168743403 T A
rs29 02 168743545 C A
rs30 02 168743603 C A
rs31 02 168743611 G T
rs32 02 168743662 T C

where the second column is 2 chars the bcor file also ends up have 2 chars. So really it is not a actual issue. I curated the .z file and chose to create it with one char.

omerwe commented 3 years ago

Hi Jerome,

Thanks for the explanation, I didn't realize that FINEMAP now also supports non-padded chromosome numbers (it used not to). I modified the code to (hopefully) support both representations. Can you please git pull and see if the problem is resolved now?

jerome-f commented 3 years ago

Hi Omer,

Just tested the code working fine now.