I have a few questions regarding the munge_sumstats.py script:
1- does the summary statistics file, passed via --sumstats, require to have RS ids? For example, if we pass a column named 'Marker' with values such as this; chr11:88249377, will the script work?
2- Secondly, when executing the following command, I get the following error and I can not tell if this is because of the summary stats file that I am passing or something else. I have included the command I have used, an example format of the summary statistics file, and the output. For what it is worth I have successfully ran the mtag_munge.py script with two other summary stats files, though they do have RS Ids in them (my data is not all annotated with RS IDs*)
Example of summary statistics file content:
MarkerName Allele1 Allele2 Freq1 FreqSE Effect StdErr P-value Direction
chr11:88249377 t c 0.9913 0.0000 -0.0234 0.0329 0.4771 -
chr15:99906873 t c 0.0710 0.0000 -0.0122 0.0371 0.742 -
chr8:135908647 a g 0.2019 0.0000 0.0014 0.0062 0.8201 +
chr12:3871714 a c 0.9725 0.0000 0.0194 0.0356 0.5858 +
chr11:97895884 c g 0.0668 0.0000 0.0386 0.0127 0.002279 +
ERROR converting summary statistics:
Conversion finished at Wed Apr 21 15:50:42 2021
Traceback (most recent call last):
File "ldsc/mtag_munge.py", line 753, in munge_sumstats
file_cnames = read_header(args.sumstats) if args.input_datgen is None else args.cnames # note keys not cleaned
File "ldsc/mtag_munge.py", line 291, in read_header
return [x.rstrip('\n') for x in openfunc(fh).readline().split()]
File "ldsc/mtag_munge.py", line 291, in
return [x.rstrip('\n') for x in openfunc(fh).readline().split()]
TypeError: a bytes-like object is required, not 'str'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "ldsc/mtag_munge.py", line 961, in munge_sumstats
logging.info(traceback.format_exc(ex))
File "/Users/krilowcn/anaconda3/envs/BINF/lib/python3.8/traceback.py", line 167, in format_exc
return "".join(format_exception(*sys.exc_info(), limit=limit, chain=chain))
File "/Users/krilowcn/anaconda3/envs/BINF/lib/python3.8/traceback.py", line 120, in format_exception
return list(TracebackException(
File "/Users/krilowcn/anaconda3/envs/BINF/lib/python3.8/traceback.py", line 508, in init
self.stack = StackSummary.extract(
File "/Users/krilowcn/anaconda3/envs/BINF/lib/python3.8/traceback.py", line 340, in extract
if limit >= 0:
TypeError: '>=' not supported between instances of 'TypeError' and 'int'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "ldsc/mtag_munge.py", line 969, in
d = munge_sumstats(parser.parse_args(), write_out=True)
File "ldsc/mtag_munge.py", line 966, in munge_sumstats
T=allele_info.sec_to_str(round(time.time() - START_TIME, 2))))
File "/Users/krilowcn/Desktop/Projects/GSIS/cellect_work/CELLECT/ldsc/lib_mtag_munge/allele_info.py", line 59, in sec_to_str
[d, h, m, s, n] = reduce(lambda ll, b : divmod(ll[0], b) + ll[1:], [(t, 1), 60, 60, 24])
NameError: name 'reduce' is not defined
CELLECT uses rsIDs as input so you will have to convert your chromosomal coordinates if you don't have them.
The problem seems to be that you are running with python 3. In our wiki tutorial we provide directions on how to install a conda environment with python 2.7 and all the other requirements for this script.
I have a few questions regarding the munge_sumstats.py script: 1- does the summary statistics file, passed via --sumstats, require to have RS ids? For example, if we pass a column named 'Marker' with values such as this; chr11:88249377, will the script work? 2- Secondly, when executing the following command, I get the following error and I can not tell if this is because of the summary stats file that I am passing or something else. I have included the command I have used, an example format of the summary statistics file, and the output. For what it is worth I have successfully ran the mtag_munge.py script with two other summary stats files, though they do have RS Ids in them (my data is not all annotated with RS IDs*)
CMD Used: python ldsc/mtag_munge.py --sumstats --snp MarkerName --a1 Allele1 --a2 Allele2 --n-value 45975 --merge-alleles data/ldsc/w_hm3.snplist --keep-pval --p P-value --out /out/gwas
Example of summary statistics file content: MarkerName Allele1 Allele2 Freq1 FreqSE Effect StdErr P-value Direction chr11:88249377 t c 0.9913 0.0000 -0.0234 0.0329 0.4771 - chr15:99906873 t c 0.0710 0.0000 -0.0122 0.0371 0.742 - chr8:135908647 a g 0.2019 0.0000 0.0014 0.0062 0.8201 + chr12:3871714 a c 0.9725 0.0000 0.0194 0.0356 0.5858 + chr11:97895884 c g 0.0668 0.0000 0.0386 0.0127 0.002279 +
ERROR converting summary statistics:
Conversion finished at Wed Apr 21 15:50:42 2021 Traceback (most recent call last): File "ldsc/mtag_munge.py", line 753, in munge_sumstats file_cnames = read_header(args.sumstats) if args.input_datgen is None else args.cnames # note keys not cleaned File "ldsc/mtag_munge.py", line 291, in read_header return [x.rstrip('\n') for x in openfunc(fh).readline().split()] File "ldsc/mtag_munge.py", line 291, in
return [x.rstrip('\n') for x in openfunc(fh).readline().split()]
TypeError: a bytes-like object is required, not 'str'
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "ldsc/mtag_munge.py", line 961, in munge_sumstats logging.info(traceback.format_exc(ex)) File "/Users/krilowcn/anaconda3/envs/BINF/lib/python3.8/traceback.py", line 167, in format_exc return "".join(format_exception(*sys.exc_info(), limit=limit, chain=chain)) File "/Users/krilowcn/anaconda3/envs/BINF/lib/python3.8/traceback.py", line 120, in format_exception return list(TracebackException( File "/Users/krilowcn/anaconda3/envs/BINF/lib/python3.8/traceback.py", line 508, in init self.stack = StackSummary.extract( File "/Users/krilowcn/anaconda3/envs/BINF/lib/python3.8/traceback.py", line 340, in extract if limit >= 0: TypeError: '>=' not supported between instances of 'TypeError' and 'int'
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "ldsc/mtag_munge.py", line 969, in
d = munge_sumstats(parser.parse_args(), write_out=True)
File "ldsc/mtag_munge.py", line 966, in munge_sumstats
T=allele_info.sec_to_str(round(time.time() - START_TIME, 2))))
File "/Users/krilowcn/Desktop/Projects/GSIS/cellect_work/CELLECT/ldsc/lib_mtag_munge/allele_info.py", line 59, in sec_to_str
[d, h, m, s, n] = reduce(lambda ll, b : divmod(ll[0], b) + ll[1:], [(t, 1), 60, 60, 24])
NameError: name 'reduce' is not defined