Closed skinnider closed 3 months ago
nn_tc_ever_v_never
is divided into two rules and the first one is just computing nn-tc
between train and test set in a particular fold. This preliminarily rule doesn't depend on any of the generated smiles which is why it runs right after create_training_sets
if I'm not mistaken.
The second rule to this step plot_nn_tc_ever_v_ever
is where the intersection between rank_file
comes to play. And I believe that runs only after all the rules in Snakemake_data
have finished running.
Got it, thanks!
Working through a test run of the entire pipeline on a new dataset, and I noticed that immediately after create_training_sets finishes (i.e., before the CLMs themselves are even trained), rule nn_tc_ever_v_never is executed:
The output of the .log file is as follows:
I think that this must reflect a missing input in the rule, because whether or not a SMILES has ever been generated cannot be determined until the CLM training, sampling, and post-processing has all occurred.