perslab / CELLECT

CELLECT (CELL-type Expression-specific integration for Complex Traits)
GNU General Public License v3.0
71 stars 19 forks source link

Error Message when running - munge_sumstats.py #69

Closed ckrilow closed 3 years ago

ckrilow commented 3 years ago

I have a few questions regarding the munge_sumstats.py script: 1- does the summary statistics file, passed via --sumstats, require to have RS ids? For example, if we pass a column named 'Marker' with values such as this; chr11:88249377, will the script work? 2- Secondly, when executing the following command, I get the following error and I can not tell if this is because of the summary stats file that I am passing or something else. I have included the command I have used, an example format of the summary statistics file, and the output. For what it is worth I have successfully ran the mtag_munge.py script with two other summary stats files, though they do have RS Ids in them (my data is not all annotated with RS IDs*)

CMD Used: python ldsc/mtag_munge.py --sumstats --snp MarkerName --a1 Allele1 --a2 Allele2 --n-value 45975 --merge-alleles data/ldsc/w_hm3.snplist --keep-pval --p P-value --out /out/gwas

Example of summary statistics file content: MarkerName Allele1 Allele2 Freq1 FreqSE Effect StdErr P-value Direction chr11:88249377 t c 0.9913 0.0000 -0.0234 0.0329 0.4771 - chr15:99906873 t c 0.0710 0.0000 -0.0122 0.0371 0.742 - chr8:135908647 a g 0.2019 0.0000 0.0014 0.0062 0.8201 + chr12:3871714 a c 0.9725 0.0000 0.0194 0.0356 0.5858 + chr11:97895884 c g 0.0668 0.0000 0.0386 0.0127 0.002279 +

ERROR converting summary statistics:

Conversion finished at Wed Apr 21 15:50:42 2021 Traceback (most recent call last): File "ldsc/mtag_munge.py", line 753, in munge_sumstats file_cnames = read_header(args.sumstats) if args.input_datgen is None else args.cnames # note keys not cleaned File "ldsc/mtag_munge.py", line 291, in read_header return [x.rstrip('\n') for x in openfunc(fh).readline().split()] File "ldsc/mtag_munge.py", line 291, in return [x.rstrip('\n') for x in openfunc(fh).readline().split()] TypeError: a bytes-like object is required, not 'str'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "ldsc/mtag_munge.py", line 961, in munge_sumstats logging.info(traceback.format_exc(ex)) File "/Users/krilowcn/anaconda3/envs/BINF/lib/python3.8/traceback.py", line 167, in format_exc return "".join(format_exception(*sys.exc_info(), limit=limit, chain=chain)) File "/Users/krilowcn/anaconda3/envs/BINF/lib/python3.8/traceback.py", line 120, in format_exception return list(TracebackException( File "/Users/krilowcn/anaconda3/envs/BINF/lib/python3.8/traceback.py", line 508, in init self.stack = StackSummary.extract( File "/Users/krilowcn/anaconda3/envs/BINF/lib/python3.8/traceback.py", line 340, in extract if limit >= 0: TypeError: '>=' not supported between instances of 'TypeError' and 'int'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "ldsc/mtag_munge.py", line 969, in d = munge_sumstats(parser.parse_args(), write_out=True) File "ldsc/mtag_munge.py", line 966, in munge_sumstats T=allele_info.sec_to_str(round(time.time() - START_TIME, 2)))) File "/Users/krilowcn/Desktop/Projects/GSIS/cellect_work/CELLECT/ldsc/lib_mtag_munge/allele_info.py", line 59, in sec_to_str [d, h, m, s, n] = reduce(lambda ll, b : divmod(ll[0], b) + ll[1:], [(t, 1), 60, 60, 24]) NameError: name 'reduce' is not defined

Tobi1kenobi commented 3 years ago

Hi,

  1. CELLECT uses rsIDs as input so you will have to convert your chromosomal coordinates if you don't have them.
  2. The problem seems to be that you are running with python 3. In our wiki tutorial we provide directions on how to install a conda environment with python 2.7 and all the other requirements for this script.

Hope this is helpful.

All the best, Tobi

ckrilow commented 3 years ago

Hi,

Thanks for getting back to me, I appreciate the help. I have resolved this issue so we can close this issue.

-Chad