Open MattWellie opened 3 days ago
Method
ooh_its_a_common_conditional_analysis_loop
tries to implement the common variant version of what you want to do:
- creates a python job in the batch
- for each gene-celltype combination it makes a new call inside that python job (analogous to creating one bash job and running multiple commands inside)
- for each python job call it:
- creates the command string for the Step 2 analysis, and runs it using subprocess.
- if that succeeds it copies the result into GCP
- it also feeds that result into the Step 3 command
- if that succeeds it copies the result into GCP
- I haven't nailed this bit yet... here it would read the result as a dataframe, decide whether it needed to add a new condition
- if there are more significant SNPs, adda new entry to
conditions
, increments the round number, and restarts- if there are not, it takes the latest results, and copies them into GCP as
<path>..._final_conditional_results
I've assumed that you want to start the analysis unconditioned? Either way, the python_job call will accept a list of conditions, or None.
You might want to change all the naming conventions. IDK.
I've no idea if this model works - it is what I was thinking, and AFAIK this is the only way to run two separate R scripts, then make a code-decision on whether to re-run them an unknown number of times.
I've left your original methods for step 2 and 3, just commented out. I'm hoping that the one method contains all that functionality
I haven't looked at the code yet but just a couple of comment:
saige_assoc.py
, which runs step 1 and then steps 2 and 3 the first time around. That said, as long as this script has a check to see if the inputs exist before re-running, and as long as the filename matches with the outputs of the OG pipeline it doesn't really matter?
Method
ooh_its_a_common_conditional_analysis_loop
tries to implement the common variant version of what you want to do:conditions
, increments the round number, and restarts<path>..._final_conditional_results
I've assumed that you want to start the analysis unconditioned? Either way, the python_job call will accept a list of conditions, or None.
You might want to change all the naming conventions. IDK.
I've no idea if this model works - it is what I was thinking, and AFAIK this is the only way to run two separate R scripts, then make a code-decision on whether to re-run them an unknown number of times.
I've left your original methods for step 2 and 3, just commented out. I'm hoping that the one method contains all that functionality