Closed rlpowles closed 7 years ago
Does it work when the script is called as main? I think I do the required arg parsing in the main function, which you can adapt for your own main functions if you want.
Now that I think of it, the code in main probably belongs in the pre_function anyway, so I'll move it there in a bit.
Sorry for the inconvenience! I've pushed a fix and things should work out-of-the-box now. My function doesn't take in any args objects; it just reads them straight from the command-line args. This is for convenience. In practice, we might want to do things like have different names for the args, but that's something we can worry about once we get everything together.
An example command to have this run would be:
/usr/bin/python2.7 prep.py sumstats_file_1.txt sumstats_file_1.txt --bimfile bim/eur_chr@_SNPmaf5.bim --N1 54162 --N2 36052
Note the @ in the bimfile name, which denotes a wildcard for chromosome number.
I think the way you have it set up makes calling the module from another script a little tricky. We're going to have to drop all the argparse code from everything but our main function eventually. I think the easiest way to do this while still letting the code run out of the box for testing is to just explicitly define the variables in the main section:
if _name == "main":
args.S1 = "sumstats_file_1.txt" args.S2 = "sumstats_file_2.txt" args.bimfile = etc.
df = pre_function(args)
and then remove all your in-function argparse code. Unfortunately this is also going to require you to manually set all defaults of munge_sumstats (like you do with the args.out variable), but I don't see a better way to do things.
We also don't want the user to be able to access munge_sumstats command line options directly. For example, I think I could supply your script with the --frq argument from munge_sumstats right now and it would get passed to your munge_sumstats call. This doesn't really work because we have no way to specify which arguments apply to which of the two sumstats files we're calling munge_sumstats on. We could probably eventually specify config files for each sumstats file, but for now I think we should just do away with any argparse code in prep.py. Does this make sense?
Sure, makes sense! But how do we accommodate for users that may need some of the optional args of munge_sumstats, like --N-col
and the other args that specify custom names of columns? In any case, I'll push the change for this in a few minutes.
Pushed! Let me know if this is what you're looking for.
Yup, this looks like what I was looking for, thanks! The optional args will definitely be an issue. I think what we might want to do is take in two config files, one for each sumstats folder, that can provide optional munge arguments. Lets just focus on base functionality for now though, we can implement that later.
Sure. Let me know when you push the LDSC part. I'll then add the calculate step and will probably have to have a back-and-forth with Qiongshi about that.
Running this as a module in my code I got this error:
Traceback (most recent call last): File "ldsc_thin.py", line 451, in
df = pr.pre_function(args)
File "/net/zhao/rlp48/ldsc_test/ldsc/custom/prep.py", line 51, in pre_function
dfs.append(munge_sumstats.munge_sumstats(args, p=False))
File "/net/zhao/rlp48/ldsc_test/ldsc/custom/munge_sumstats.py", line 523, in munge_sumstats
if args.no_alleles and args.merge_alleles:
AttributeError: 'Namespace' object has no attribute 'no_alleles'
I think the problem here is the munge_sumstats parser doesn't get called when the script isn't called as main. You'll need to add some code to set any default arg attributes that the munge_sumstats parser sets (starts at line 444 of sumstats.py). You might need to do some more thorough testing of your code to make sure that munge_sumstats is going to run how you expect it to.