omerwe / polyfun

PolyFun (POLYgenic FUNctionally-informed fine-mapping)
MIT License
85 stars 21 forks source link

Issues on the number of chromosomes in different species (2) #194

Open heebokp opened 3 months ago

heebokp commented 3 months ago

First, thank you so much for accepting my request on the "--num-chr" option in the PolyFun package.

Using the upadted option, we are now figuring out to execute the PloyFun program using pIg genome and functional annotation data.

Instead of 18 slots for pig chromosomes, it seems that the polyfun still had 22 slots as it was.

And I've got the error messages as follows: FileNotFoundError: [Errno 2] No such file or directory: 'annotations.19.l2.ldscore.parquet'

and "polyfun.py: error: argument --num-chr: invalid int value: '1-18'"

We also tried "git pull" as well. But it did not work.

So, I would appreciate if you could take a look at the situation regarding --num-chr" option.

FYI, I would like to attach the code lines from LINUX history file.


[ Input_files]# python3 ../polyfun.py --compute-h2-L2 --no-partitions --output-prefix ../output/Pig_snpvar --sumstats LK_sumstats_munged.parquet --ref-ld-chr annotations. --w-ld-chr weights. --num-chr 18


[INFO] Reading summary statistics from pig_sumstats_munged.parquet ... [INFO] Read summary statistics for 13617496 SNPs. [INFO] Reading reference panel LD Score from annotations.[1-22] ... 82%|===================================== | 18/22 [00:19<00:04, 1.10s/it]

Traceback (most recent call last): File "/root/Desktop/polyfun_forPigs/Input_files/../polyfun.py", line 851, in polyfun_obj.polyfun_main(args) File "/root/Desktop/polyfun_forPigs/Input_files/../polyfun.py", line 774, in polyfun_main self.polyfun_h2_L2(args) File "/root/Desktop/polyfun_forPigs/Input_files/../polyfun.py", line 596, in polyfun_h2_L2 self.run_ldsc(args, use_ridge=True, nn=False, evenodd_split=False, keep_large=False) File "/root/Desktop/polyfun_forPigs/Input_files/../polyfun.py", line 179, in run_ldsc M_annot, w_ld_cname, ref_ld_cnames, dfsumstats, = sumstats._read_ld_sumstats(args, log, args.h2) File "/root/Desktop/polyfun_forPigs/ldsc_polyfun/sumstats.py", line 252, in _read_ld_sumstats ref_ld = _read_ref_ld(args, log) File "/root/Desktop/polyfun_forPigs/ldsc_polyfun/sumstats.py", line 86, in _read_ref_ld ref_ld = _read_chr_split_files(args.ref_ld_chr, args.ref_ld, log, File "/root/Desktop/polyfun_forPigs/ldsc_polyfun/sumstats.py", line 160, in _read_chr_split_files out = parsefunc(_splitp(chr_arg), _N_CHR, **kwargs) File "/root/Desktop/polyfun_forPigs/ldsc_polyfun/parse.py", line 128, in ldscore_fromlist y = ldscore(fh, num) File "/root/Desktop/polyfun_forPigs/ldsc_polyfun/parse.py", line 206, in ldscore chr_ld.append(l2_parser(sub_chr(fh, i) + suffix + s, compression)) File "/root/Desktop/polyfun_forPigs/ldsc_polyfun/parse.py", line 161, in l2_parser x = read_csv(fh, header=0, compression=compression) File "/root/Desktop/polyfun_forPigs/ldsc_polyfun/parse.py", line 26, in read_csv df = pd.read_parquet(fh) File "/opt/python/3.9.0/lib/python3.9/site-packages/pandas/io/parquet.py", line 667, in read_parquet return impl.read( File "/opt/python/3.9.0/lib/python3.9/site-packages/pandas/io/parquet.py", line 267, in read path_or_handle, handles, filesystem = _get_path_or_handle( File "/opt/python/3.9.0/lib/python3.9/site-packages/pandas/io/parquet.py", line 140, in _get_path_or_handle handles = get_handle( File "/opt/python/3.9.0/lib/python3.9/site-packages/pandas/io/common.py", line 882, in get_handle handle = open(handle, ioargs.mode)

FileNotFoundError: [Errno 2] No such file or directory: 'annotations.19.l2.ldscore.parquet'

[ Input_files]# python3 ../polyfun.py --compute-h2-L2 --no-partitions --output-prefix ../output/Pig_snpvar --sumstats LK_sumstats_munged.parquet --ref-ld-chr annotations. --w-ld-chr weights. --num-chr 1-18


usage: polyfun.py [-h] [--num-bins NUM_BINS] [--anno ANNO] [--skip-Ckmedian] [--compute-ldscores] [--compute-h2-L2] [--compute-h2-bins] [--no-partitions] [--chr CHR] [--ld-wind-cm LD_WIND_CM] [--ld-wind-kb LD_WIND_KB] [--ld-wind-snps LD_WIND_SNPS] [--chunk-size CHUNK_SIZE] [--keep KEEP] [--q Q] [--sumstats SUMSTATS] [--ref-ld-chr REF_LD_CHR] [--w-ld-chr W_LD_CHR] [--bfile-chr BFILE_CHR] [--ld-ukb] [--ld-dir LD_DIR] --output-prefix OUTPUT_PREFIX [--allow-missing] [--num-chr NUM_CHR] [--nnls-exact]

polyfun.py: error: argument --num-chr: invalid int value: '1-18'

[ Input_files]# ../ -bash: ../: Is a directory [ Input_files]# cd .. [ polyfun_forPigs]# git pull

warning: Pulling without specifying how to reconcile divergent branches is discouraged. You can squelch this message by running one of the following commands sometime before your next pull:

git config pull.rebase false # merge (the default strategy) git config pull.rebase true # rebase git config pull.ff only # fast-forward only

You can replace "git config" with "git config --global" to set a default preference for all repositories. You can also pass --rebase, --no-rebase, or --ff-only on the command line to override the configured default per invocation.

Already up to date.

omerwe commented 3 months ago

@heebokp apologies for the bug. I believe I fixed it, please git pull and try again (the first command you ran is fine). Also, apologies that I have limited bandwidth to test my solutions in depth. Please let me know how it goes and I'll quickly fix whatever problems you run into.

heebokp commented 3 months ago

Please don't feel sorry about the bug! I completely understand that these things occasionally occur. I sincerely appreciate you putting in the effort to get it fixed.

As a step forward to get it fixed, I would like to report the current status regarding "--num-chr 1-18"

As you suggested, we first tried git pull as follows: ########################################### [....]# git pull warning: Pulling without specifying how to reconcile divergent branches is discouraged. You can squelch this message by running one of the following commands sometime before your next pull:

git config pull.rebase false # merge (the default strategy) git config pull.rebase true # rebase git config pull.ff only # fast-forward only

You can replace "git config" with "git config --global" to set a default preference for all repositories. You can also pass --rebase, --no-rebase, or --ff-only on the command line to override the configured default per invocation.

remote: Enumerating objects: 9, done. remote: Counting objects: 100% (9/9), done. remote: Compressing objects: 100% (1/1), done. remote: Total 5 (delta 4), reused 5 (delta 4), pack-reused 0 Unpacking objects: 100% (5/5), 529 bytes | 529.00 KiB/s, done.

From https://github.com/omerwe/polyfun af252e4..05fcd49 master -> origin/master Updating af252e4..05fcd49 Fast-forward ldsc.py | 3 +++ ldsc_polyfun/sumstats.py | 1 + 2 files changed, 4 insertions(+)

However, we still got the messages as follows:

[INFO] Reading reference panel LD Score from annotations.[1-22] ... 82%|==============================================▎ | 18/22 [00:19<00:04, 1.08s/it]

FileNotFoundError: [Errno 2] No such file or directory: 'annotations.19.l2.ldscore.parquet'

###########################################

In this step, we would like to obtain "snpvar" results using "polyfun.py" for the nest step. However, if we take a look at the message below, it seems that polyfun.py could be still needed to be updatated.

++++++++++++++++++++++++++++++++ From https://github.com/omerwe/polyfun af252e4..05fcd49 master -> origin/master Updating af252e4..05fcd49 Fast-forward ldsc.py | 3 +++ ldsc_polyfun/sumstats.py | 1 + 2 files changed, 4 insertions(+) ++++++++++++++++++++++++++++++++

omerwe commented 3 months ago

My apologies I made a stupid mistake, please git pull and retry again?

omerwe commented 2 months ago

@heebokp can I close the issue?

heebokp commented 2 months ago

@omerwe, I should have reported the progress on the --num-chr earlier. I am so sorry for the late. Actually, thanks to your kind considerations, we could finally get the expected PolyFUN output using our pig genome data. By the way, I would like to report one thing. When we did finemap using FINEMAP/SUSIE stand alone, we did not need to do LD pruning but PolyFUN did not work using data without LD pruning. However, after the LD pruning, PolyFUN worked at last! Thanks so much for all your efforts!!

omerwe commented 2 months ago

@heebokp great to hear it works. What problem did you run into before LD pruning? I should warn that LD-pruning is inherently problematic for fine-mapping, because you might remove the causal SNP that you're searching for.