Closed skinnider closed 1 month ago
Looking at the slurm logs, it seems like this job ran out of time, but because the train0_structures_SMILES_3_carbon.csv.gz
file was not empty, the next rule was executed.
Is there a way to also require that the -unique.smi
file written at the end of rule add_carbon
also exists?
More generally, is it worth building some robustness checks to ensure that the preceding job completed and not just that it wrote output to a file? for example, looking at sample_molecules_RNN, is it the case that once a single row is written to the CSV file, the next jobs are free to execute, regardless of whether something goes wrong halfway through the job (e.g., the job is killed by slurm)?
I'm going to delete these files and resubmit the jobs with more wall time, but attaching the files and some representative logs here for future reference. AddCarbon files and logs.zip
@skinnider - The input/output files are used to generate a dependency graph before the workflow starts executing. snakemake
will never execute a downstream rule unless an upstream rule it depends on (directly or indirectly) has completed with an error code of 0. So a rule is free to stream to the output file its supposed to generate. If it runs out of time, or otherwise errors out, or fails to produce the output files that were part of the dependency graph, the rule is marked as having failed (and downstream rules are still considered pending), and any output file(s) it generated (which were part of the dependency graph - it is free to generate other files that snakemake
knows nothing about) are deleted by snakemake
.
A rule can generate other files that are not part of the dependency graph - these files are not touched by snakemake
.
So I suspect there's something else going on here other than the slurm timeout. I'm investigating and will keep you posted..
@skinnider - in the log files you attached here, I see:
(INFO) (__main__.py) (18-Jul-24 23:22:44) CLM vsrc
This tells me that you're using clm
without pip install
ing it first. This is fine, but it makes it tricky to find out exactly which version you used to run the workflow. Would you mind uploading your version of add_carbon.py
here? In the current version on master
, I'm not seeing how any *carbon.gz
file could possibly be generated without a header line (as the ones I see here).
I do have the CLM packaged pip install'd but was running from within ~/git/CLM. Does it run from source via default like that? Regardless, here's my copy of add_carbon.py: add_carbon.py.zip
As far as I can tell, the only commit between the error (Jul 19 am) and successful re-submission of the same jobs/DAG (Jul 20 or 21) was to increase runtime and memory for rule add_carbon
.
So it does look like the add_carbon
step is the only place in our codebase that is taking the else
branch in the write_to_csv_file
(i.e. the only place where the input is not a DataFrame
). I'll open up a PR on this soon..
btw, still getting this same error with newly-submitted jobs. Providing the filepaths in the log below in case they are useful to test the PR:
[Tue Aug 13 15:54:50 2024]
rule write_structural_prior_CV:
input: /Genomics/skinniderlab/food-clm/clm/database=FooDB-representation=ecfp4-metric=tc-optimizer=feature_based-rarefaction=0.3/100/prior/inputs/train0_structures_SMILES_0.smi, /Genomics/skinniderlab/food-clm/clm/database=FooDB-representation=ecfp4-metric=tc-optimizer=feature_based-rarefaction=0.3/100/prior/inputs/test0_structures_SMILES_0.smi, /Genomics/skinniderlab/food-clm/PubChem.tsv, /Genomics/skinniderlab/food-clm/clm/database=FooDB-representation=ecfp4-metric=tc-optimizer=feature_based-rarefaction=0.3/100/prior/samples/structures_SMILES_0_unique_masses.csv.gz, /Genomics/skinniderlab/food-clm/clm/database=FooDB-representation=ecfp4-metric=tc-optimizer=feature_based-rarefaction=0.3/100/prior/inputs/train0_structures_SMILES_0_carbon.csv.gz
output: /Genomics/skinniderlab/food-clm/clm/database=FooDB-representation=ecfp4-metric=tc-optimizer=feature_based-rarefaction=0.3/100/prior/structural_prior/structures_SMILES_0_CV_ranks_structure.csv.gz, /Genomics/skinniderlab/food-clm/clm/database=FooDB-representation=ecfp4-metric=tc-optimizer=feature_based-rarefaction=0.3/100/prior/structural_prior/structures_SMILES_0_CV_tc.csv.gz
jobid: 0
reason: Forced execution
wildcards: output_dir=/Genomics/skinniderlab/food-clm/clm/database=FooDB-representation=ecfp4-metric=tc-optimizer=feature_based-rarefaction=0.3, enum_factor=100, dataset=structures, repr=SMILES, fold=0
resources: mem_mb=64000, mem_mib=61036, disk_mb=12214, disk_mib=11649, tmpdir=/tmp, slurm_partition=main,hoppertest,skinniderlab, runtime=1015
reading NP model ...
model in
(INFO) (__main__.py) (13-Aug-24 15:54:53) CLM vsrc
0%| | 0/1 [00:00<?, ?it/s]/Genomics/argo/users/ms0270/git/CLM/src/clm/functions.py:466: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, pd.DataFrame(chunk_data)], ignore_index=True)
100%|██████████| 1/1 [00:06<00:00, 6.71s/it]
0%| | 0/1 [00:00<?, ?it/s]/Genomics/argo/users/ms0270/git/CLM/src/clm/functions.py:466: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
df = pd.concat([df, pd.DataFrame(chunk_data)], ignore_index=True)
100%|██████████| 1/1 [00:01<00:00, 1.58s/it]
(INFO) (write_structural_prior_CV.py) (13-Aug-24 15:55:04) Reading PubChem file
(INFO) (write_structural_prior_CV.py) (13-Aug-24 15:57:04) Reading sample file from generative model
/Genomics/argo/users/ms0270/git/CLM/src/clm/functions.py:524: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.
return pd.read_csv(filename, compression=compression, **kwargs)
Traceback (most recent call last):
File "/Genomics/skinniderlab/PED-generation/env-clm/bin/clm", line 8, in <module>
sys.exit(main())
File "/Genomics/argo/users/ms0270/git/CLM/src/clm/__main__.py", line 76, in main
args.func(args)
File "/Genomics/argo/users/ms0270/git/CLM/src/clm/commands/write_structural_prior_CV.py", line 297, in main
write_structural_prior_CV(
File "/Genomics/argo/users/ms0270/git/CLM/src/clm/commands/write_structural_prior_CV.py", line 253, in write_structural_prior_CV
addcarbon.drop(columns="input_smiles", inplace=True)
File "/Genomics/skinniderlab/PED-generation/env-clm/lib/python3.10/site-packages/pandas/core/frame.py", line 5581, in drop
return super().drop(
File "/Genomics/skinniderlab/PED-generation/env-clm/lib/python3.10/site-packages/pandas/core/generic.py", line 4788, in drop
obj = obj._drop_axis(labels, axis, level=level, errors=errors)
File "/Genomics/skinniderlab/PED-generation/env-clm/lib/python3.10/site-packages/pandas/core/generic.py", line 4830, in _drop_axis
new_axis = axis.drop(labels, errors=errors)
File "/Genomics/skinniderlab/PED-generation/env-clm/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 7070, in drop
raise KeyError(f"{labels[mask].tolist()} not found in axis")
KeyError: "['input_smiles'] not found in axis"
[Tue Aug 13 15:57:12 2024]
Error in rule write_structural_prior_CV:
jobid: 0
input: /Genomics/skinniderlab/food-clm/clm/database=FooDB-representation=ecfp4-metric=tc-optimizer=feature_based-rarefaction=0.3/100/prior/inputs/train0_structures_SMILES_0.smi, /Genomics/skinniderlab/food-clm/clm/database=FooDB-representation=ecfp4-metric=tc-optimizer=feature_based-rarefaction=0.3/100/prior/inputs/test0_structures_SMILES_0.smi, /Genomics/skinniderlab/food-clm/PubChem.tsv, /Genomics/skinniderlab/food-clm/clm/database=FooDB-representation=ecfp4-metric=tc-optimizer=feature_based-rarefaction=0.3/100/prior/samples/structures_SMILES_0_unique_masses.csv.gz, /Genomics/skinniderlab/food-clm/clm/database=FooDB-representation=ecfp4-metric=tc-optimizer=feature_based-rarefaction=0.3/100/prior/inputs/train0_structures_SMILES_0_carbon.csv.gz
output: /Genomics/skinniderlab/food-clm/clm/database=FooDB-representation=ecfp4-metric=tc-optimizer=feature_based-rarefaction=0.3/100/prior/structural_prior/structures_SMILES_0_CV_ranks_structure.csv.gz, /Genomics/skinniderlab/food-clm/clm/database=FooDB-representation=ecfp4-metric=tc-optimizer=feature_based-rarefaction=0.3/100/prior/structural_prior/structures_SMILES_0_CV_tc.csv.gz
conda-env: clm
shell:
clm write_structural_prior_CV --ranks_file /Genomics/skinniderlab/food-clm/clm/database=FooDB-representation=ecfp4-metric=tc-optimizer=feature_based-rarefaction=0.3/100/prior/structural_prior/structures_SMILES_0_CV_ranks_structure.csv.gz --tc_file /Genomics/skinniderlab/food-clm/clm/database=FooDB-representation=ecfp4-metric=tc-optimizer=feature_based-rarefaction=0.3/100/prior/structural_prior/structures_SMILES_0_CV_tc.csv.gz --train_file /Genomics/skinniderlab/food-clm/clm/database=FooDB-representation=ecfp4-metric=tc-optimizer=feature_based-rarefaction=0.3/100/prior/inputs/train0_structures_SMILES_0.smi --test_file /Genomics/skinniderlab/food-clm/clm/database=FooDB-representation=ecfp4-metric=tc-optimizer=feature_based-rarefaction=0.3/100/prior/inputs/test0_structures_SMILES_0.smi --pubchem_file /Genomics/skinniderlab/food-clm/PubChem.tsv --sample_file /Genomics/skinniderlab/food-clm/clm/database=FooDB-representation=ecfp4-metric=tc-optimizer=feature_based-rarefaction=0.3/100/prior/samples/structures_SMILES_0_unique_masses.csv.gz --err_ppm 10 --seed 42 --carbon_file /Genomics/skinniderlab/food-clm/clm/database=FooDB-representation=ecfp4-metric=tc-optimizer=feature_based-rarefaction=0.3/100/prior/inputs/train0_structures_SMILES_0_carbon.csv.gz --top_n 30
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
srun: error: argo-28: task 0: Exited with exit code 1
srun: launch/slurm: _step_signal: Terminating StepId=5898275.0
[Tue Aug 13 15:57:15 2024]
Error in rule write_structural_prior_CV:
jobid: 0
input: /Genomics/skinniderlab/food-clm/clm/database=FooDB-representation=ecfp4-metric=tc-optimizer=feature_based-rarefaction=0.3/100/prior/inputs/train0_structures_SMILES_0.smi, /Genomics/skinniderlab/food-clm/clm/database=FooDB-representation=ecfp4-metric=tc-optimizer=feature_based-rarefaction=0.3/100/prior/inputs/test0_structures_SMILES_0.smi, /Genomics/skinniderlab/food-clm/PubChem.tsv, /Genomics/skinniderlab/food-clm/clm/database=FooDB-representation=ecfp4-metric=tc-optimizer=feature_based-rarefaction=0.3/100/prior/samples/structures_SMILES_0_unique_masses.csv.gz, /Genomics/skinniderlab/food-clm/clm/database=FooDB-representation=ecfp4-metric=tc-optimizer=feature_based-rarefaction=0.3/100/prior/inputs/train0_structures_SMILES_0_carbon.csv.gz
output: /Genomics/skinniderlab/food-clm/clm/database=FooDB-representation=ecfp4-metric=tc-optimizer=feature_based-rarefaction=0.3/100/prior/structural_prior/structures_SMILES_0_CV_ranks_structure.csv.gz, /Genomics/skinniderlab/food-clm/clm/database=FooDB-representation=ecfp4-metric=tc-optimizer=feature_based-rarefaction=0.3/100/prior/structural_prior/structures_SMILES_0_CV_tc.csv.gz
conda-env: clm
shell:
clm write_structural_prior_CV --ranks_file /Genomics/skinniderlab/food-clm/clm/database=FooDB-representation=ecfp4-metric=tc-optimizer=feature_based-rarefaction=0.3/100/prior/structural_prior/structures_SMILES_0_CV_ranks_structure.csv.gz --tc_file /Genomics/skinniderlab/food-clm/clm/database=FooDB-representation=ecfp4-metric=tc-optimizer=feature_based-rarefaction=0.3/100/prior/structural_prior/structures_SMILES_0_CV_tc.csv.gz --train_file /Genomics/skinniderlab/food-clm/clm/database=FooDB-representation=ecfp4-metric=tc-optimizer=feature_based-rarefaction=0.3/100/prior/inputs/train0_structures_SMILES_0.smi --test_file /Genomics/skinniderlab/food-clm/clm/database=FooDB-representation=ecfp4-metric=tc-optimizer=feature_based-rarefaction=0.3/100/prior/inputs/test0_structures_SMILES_0.smi --pubchem_file /Genomics/skinniderlab/food-clm/PubChem.tsv --sample_file /Genomics/skinniderlab/food-clm/clm/database=FooDB-representation=ecfp4-metric=tc-optimizer=feature_based-rarefaction=0.3/100/prior/samples/structures_SMILES_0_unique_masses.csv.gz --err_ppm 10 --seed 42 --carbon_file /Genomics/skinniderlab/food-clm/clm/database=FooDB-representation=ecfp4-metric=tc-optimizer=feature_based-rarefaction=0.3/100/prior/inputs/train0_structures_SMILES_0_carbon.csv.gz --top_n 30
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
I've now re-run 4 sets of Snakemake runs (5 enum factors each) that were giving these errors before with no issues so far, so I'm going to close this and #238. Thanks for your help with both @vineetbansal!
write_structural_prior_CV
is failing seemingly because the"input_smiles"
column does not exist in the AddCarbon file:Indeed,
/Genomics/skinniderlab/food-clm/clm/database=FooDB-representation=NA-metric=NA-optimizer=NA-rarefaction=1-enum_factor=30/30/prior/inputs/train0_structures_SMILES_3_carbon.csv.gz
has no header (which I thinkwrite_structural_prior_CV
assumes?) and contains only 920 rows, vs. 42236 in the input SMILES file (so AddCarbon should have substantially more than this).I'm wondering if something went wrong with the AddCarbon step but somehow a file was written anyway?