omerwe / polyfun

PolyFun (POLYgenic FUNctionally-informed fine-mapping)
MIT License
85 stars 21 forks source link

how to convert ref ldsc to polyfun ref #26

Closed bnj50 closed 4 years ago

bnj50 commented 4 years ago

are you planning to include the "polyfun version" of files related to --ref-ld-chr and --w-ld-chr in your download ftp site: https://data.broadinstitute.org/alkesgroup/LDSCORE/

python polyfun.py \ --compute-h2-L2 \ --no-partitions \ --output-prefix output/testrun \ --sumstats example_data/sumstats.parquet \ --ref-ld-chr example_data/annotations. \ --w-ld-chr example_data/weights.

omerwe commented 4 years ago

Hi,

You can find these here: https://data.broadinstitute.org/alkesgroup/LDSCORE/baselineLF_v2.2.UKB.polyfun.tar.gz (sorry if it's unclear --- please see the FAQ for details).

On Tue, Mar 3, 2020 at 12:54 PM namjoub2 notifications@github.com wrote:

are you planning to include the "polyfun version" of files related to --ref-ld-chr and --w-ld-chr in your download ftp site: https://data.broadinstitute.org/alkesgroup/LDSCORE/

python polyfun.py --compute-h2-L2 --no-partitions --output-prefix output/testrun --sumstats example_data/sumstats.parquet --ref-ld-chr example_data/annotations. --w-ld-chr example_data/weights.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/omerwe/polyfun/issues/26?email_source=notifications&email_token=ACNCB43NNIWKVMOGGITMA7LRFU77DA5CNFSM4LAPYNAKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4ISDLWJA, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNCB4YTDKPO5NUR7HAUEULRFU77DANCNFSM4LAPYNAA .

bnj50 commented 4 years ago

thanks ...so those limited to uk biobank...also if I download the UK file, it seems that I need to also download related LD files and put in a directory as LDstore from here https://data.broadinstitute.org/alkesgroup/UKBB_LD/ to run_finemapper.py --ldstore

bnj50 commented 4 years ago

also do I need to change allele1 and 2 in uk-ld files to A1 nd A2 ( also is the A1 the effect allele -like plink- or reference allele as you wrote?)... the reason that I ask is I am getting this error: bash-4.2$ python /usr/local/polyfun/1.0.0/run_finemapper.py --ld ldstore-uk/chr22_43000001_46000001 --sumstats 22b --n 15000 --chr 22 --start 44000000 --end 44500000 --method susie --max-num-causal 2 --out test1


[INFO] Loaded an LD matrix for 26498 SNPs from ldstore-uk/chr22_43000001_46000001.npz [INFO] Loading sumstats file... [INFO] Loaded sumstats for 99 SNPs Traceback (most recent call last): File "/usr/local/anaconda3/envs/polyfun/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2646, in get_loc return self._engine.get_loc(key) File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1618, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1626, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'A1'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/polyfun/1.0.0/run_finemapper.py", line 123, in verbose=args.verbose, ld=ld, df_ld_snps=df_ld_snps, debug_dir=args.debug_dir) File "/usr/local/polyfun/1.0.0/finemapper.py", line 388, in finemap self.sync_ld_sumstats(ld, df_ld_snps) File "/usr/local/polyfun/1.0.0/finemapper.py", line 121, in sync_ld_sumstats df_ld_snps = set_snpid_index(df_ld_snps) File "/usr/local/polyfun/1.0.0/finemapper.py", line 18, in set_snpid_index df['A1_first'] = (df['A1'] < df['A2']) | (df['A1'].str.len()>1) | (df['A2'].str.len()>1) File "/usr/local/anaconda3/envs/polyfun/lib/python3.6/site-packages/pandas/core/frame.py", line 2800, in getitem indexer = self.columns.get_loc(key) File "/usr/local/anaconda3/envs/polyfun/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2648, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1618, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1626, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'A1'

omerwe commented 4 years ago

Hi,

  1. The annotation/LD-score files that we provide are based on UKB imputed SNPs with MAF>0.1%. This includes many 1000G SNPs. We decided to not use 1000G-based LD-score files because 1000G is a very small reference panel (<400 European-ancestry individuals) so it's not suitable for annotations with low-frequency SNPs (see e.g. Gazal et al. 2018 Nat Genet). If you had a large sequenced reference panel it would be preferable, but most people don't have access to such a panel. Generally, it is much better to use a large LD reference panel from the target study. Using 1000G or something similar leads to many false-positive results (see e.g. Ulirsch et al. 2019 Nat Genet).

  2. Yes, the sumstats files need to have the fields A1,A2 as in LDSC (see https://github.com/bulik/ldsc/wiki/Summary-Statistics-File-Format)

  3. A1 is the effect allele (which can be either reference or alternative --- these are orthogonal things).

Please let me know if this answers your questions!

Omer

On Tue, Mar 3, 2020 at 2:39 PM namjoub2 notifications@github.com wrote:

also do I need to change allele1 and 2 in uk-ld files to A1 nd A2 ( also is the A1 the effect allele -like plink- or reference allele as you wrote?)... the reason that I ask is I am getting this error: bash-4.2$ python /usr/local/polyfun/1.0.0/run_finemapper.py --ld ldstore-uk/chr22_43000001_46000001 --sumstats 22b --n 15000 --chr 22 --start 44000000 --end 44500000 --method susie --max-num-causal 2 --out test1

  • Fine-mapping Wrapper
  • Version 1.0.0
  • (C) 2019 Omer Weissbrod

[INFO] Loaded an LD matrix for 26498 SNPs from ldstore-uk/chr22_43000001_46000001.npz [INFO] Loading sumstats file... [INFO] Loaded sumstats for 99 SNPs Traceback (most recent call last): File "/usr/local/anaconda3/envs/polyfun/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2646, in get_loc return self._engine.get_loc(key) File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1618, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1626, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'A1'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/polyfun/1.0.0/run_finemapper.py", line 123, in verbose=args.verbose, ld=ld, df_ld_snps=df_ld_snps, debug_dir=args.debug_dir) File "/usr/local/polyfun/1.0.0/finemapper.py", line 388, in finemap self.sync_ld_sumstats(ld, df_ld_snps) File "/usr/local/polyfun/1.0.0/finemapper.py", line 121, in sync_ld_sumstats df_ld_snps = set_snpid_index(df_ld_snps) File "/usr/local/polyfun/1.0.0/finemapper.py", line 18, in set_snpid_index df['A1_first'] = (df['A1'] < df['A2']) | (df['A1'].str.len()>1) | (df['A2'].str.len()>1) File "/usr/local/anaconda3/envs/polyfun/lib/python3.6/site-packages/pandas/core/frame.py", line 2800, in getitem indexer = self.columns.get_loc(key) File "/usr/local/anaconda3/envs/polyfun/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2648, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1618, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1626, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'A1'

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/omerwe/polyfun/issues/26?email_source=notifications&email_token=ACNCB44YOBKOGO75PVB3JZTRFVMIZA5CNFSM4LAPYNAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENU3RAI#issuecomment-594131073, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNCB44D2K77JRZPP2MGAKLRFVMIZANCNFSM4LAPYNAA .

bnj50 commented 4 years ago

thank you for the info. so my sumstat file has A1 and A2 but the uk-LD file that I downloaded from ukbiobank (https://data.broadinstitute.org/alkesgroup/UKBB_LD/) has allele1 and allele2 in header as well as rsid (instead of SNP). not sure if the above error might be related to these headers and if yes, do I need to change the header in uk files to A1 and A2. I can decompress the gz file to do that but not sure about npz file. thanks

omerwe commented 4 years ago

Hi,

You don't need to change anything. The LD-score files use headers based on the LDStore format, rather than the LD-score format (this is annoying, I know...). The software expects to see results in that particular format so you don't need to change anything.

Apparently there's some problem in your sumstats file... Can you please send me an example of the first few lines of your sumstats files (by email to oweissbrod@hsph.harvard.edu if you prefer) and I'll try to understand what's wrong. Please also note that there are many example files in the polyfun repo that may help you understand the source of the problem.

Thanks,

Omer

On Tue, Mar 3, 2020 at 3:43 PM namjoub2 notifications@github.com wrote:

thank you for the info. so my sumstat file has A1 and A2 but the uk-LD file that I downloaded from ukbiobank ( https://data.broadinstitute.org/alkesgroup/UKBB_LD/) has allele1 and allele2 in header as well as rsid (instead of SNP). not sure if the above error might be related to these headers and if yes, do I need to change the header in uk files to A1 and A2. I can decompress the gz file to do that but not sure about npz file. thanks

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/omerwe/polyfun/issues/26?email_source=notifications&email_token=ACNCB44D55AJ4PLWEBABJETRFVTWRA5CNFSM4LAPYNAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENVCIXI#issuecomment-594158685, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNCB43KQQ2PVLQIPRNWKNDRFVTWRANCNFSM4LAPYNAA .

omerwe commented 4 years ago

Hi,

Thanks for the bug report --- this was indeed a bug and I just fixed it. Can you please git pull the latest version and rerun?

Thanks and sorry for the trouble,

Omer

On Tue, Mar 3, 2020 at 3:55 PM Omer Weissbrod omer.we@gmail.com wrote:

Hi,

You don't need to change anything. The LD-score files use headers based on the LDStore format, rather than the LD-score format (this is annoying, I know...). The software expects to see results in that particular format so you don't need to change anything.

Apparently there's some problem in your sumstats file... Can you please send me an example of the first few lines of your sumstats files (by email to oweissbrod@hsph.harvard.edu if you prefer) and I'll try to understand what's wrong. Please also note that there are many example files in the polyfun repo that may help you understand the source of the problem.

Thanks,

Omer

On Tue, Mar 3, 2020 at 3:43 PM namjoub2 notifications@github.com wrote:

thank you for the info. so my sumstat file has A1 and A2 but the uk-LD file that I downloaded from ukbiobank ( https://data.broadinstitute.org/alkesgroup/UKBB_LD/) has allele1 and allele2 in header as well as rsid (instead of SNP). not sure if the above error might be related to these headers and if yes, do I need to change the header in uk files to A1 and A2. I can decompress the gz file to do that but not sure about npz file. thanks

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/omerwe/polyfun/issues/26?email_source=notifications&email_token=ACNCB44D55AJ4PLWEBABJETRFVTWRA5CNFSM4LAPYNAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENVCIXI#issuecomment-594158685, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNCB43KQQ2PVLQIPRNWKNDRFVTWRANCNFSM4LAPYNAA .