pjgreer / ukb-rap-tools

Scripts and workflows for use analyzing UK Biobank data from the DNANexus Research Analysis Platform
45 stars 9 forks source link

Question 16a combine hybrid outfiles #6

Closed alyssacl closed 1 year ago

alyssacl commented 1 year ago

Hi Phil, Probably a silly question but thought I would ask anyways given I am still getting used to the RAP and dx. In the script for 16a this file is referenced in your script as an example: ukb22828_AP_c2_v3.AP.glm.logistic.hybrid

I understand I need to change this to reflect my specific file names, which I have done. However, why is it c2/chromosome 2? Should I modify to actually loop through all the files and include something like c${i} instead?

Thanks, Alyssa

Code: dx run swiss-army-knife -iin="/${data_file_dir}/ukb22828_AP_c2_v3.AP.glm.logistic.hybrid" \ -icmd="${merge_cmd}" --tag="Step1" --instance-type "mem1_ssd1_v2_x16"\ --destination="${project}:/data/ap_imp37_gwas/" --brief --yes

pjgreer commented 1 year ago

Alyssa,

This is actually a quirk of the "dx run" command. Because of the way it was implemented, it must have at least one -iin file. You don't actually have to use that file in the script, but it MUST be passed to the dx run command. In truth, I probably should have picked a smaller file, (like the phenotype file) but the file itself is unimportant because the command doesn't make use of the file we pass into the dx run command via the -iin flag.

So why is that? In the script, you are using dx fuse to mount the $data_file_dir at "/mnt/project/$data_file_dir " and then looping over every glm.logistic.hybrid file (one for each chromosome) in that directory. If we didn't use dx fuse to mount that project dir, we would have to pass each glm.logistic.hybrid file to the dx run command separately. (22+ separate -iin flags). Using dx fuse is much easier, but leaves a silly junk file that has to be passed to the command.

You will find this in a couple of these scripts where I needed to pass an -iin file, but am not actually using it. this is normally when we are merging results files.

-Phil

alyssacl commented 1 year ago

Make sense! Thank you