openproblems-bio / openproblems

Formalizing and benchmarking open problems in single-cell genomics
MIT License
319 stars 79 forks source link

Nextflow execution_trace.txt reporting exit status `-` for some methods with successful output #228

Closed dburkhardt closed 1 year ago

dburkhardt commented 3 years ago

results.zip


So this issue is coming up with the docker-jupyter-kernels branch. Not sure if it will propagate for other folks. Right now, as of https://github.com/singlecellopenproblems/SingleCellOpenProblems/commit/2787beb4bf76a9665980f97539faf011e5f5b0f1, the parse_nextflow.py script is failing. This script parses the output of the results directory and prepares JSON files for the website. Because we're not updating the website during the jam, this isn't an essential issue and my plan is to bypass.

The error is:

$ python workflow/parse_nextflow.py
Traceback (most recent call last):
  File "workflow/parse_nextflow.py", line 202, in <module>
    main()
  File "workflow/parse_nextflow.py", line 194, in main
    results = parse_metric_results(results)
  File "workflow/parse_nextflow.py", line 105, in parse_metric_results
    results[task_name][dataset_name][method_name]["metrics"][metric_name] = result
KeyError: 'mnn_log_cpm'
Error: Process completed with exit code 1.

The problem is that some of the entries in results/pipeline_info/execution_trace.txt don't record a 0 exit status and instead show up like this:

task_id hash    native_id   name    status  exit    submit  duration    realtime    %cpu    peak_rss    peak_vmem   rchar   wchar
219 d0/3ce0e1   0522968f-70d1-4e7d-a1ae-e0ef7a0686fe    run_method (multimodal_data_integration:mnn_log_cpm-scicar_cell_lines:openproblems-r-extras)    CACHED  -   2021-03-22 17:29:20.707 -   -   -   -   -   -   -

This means that these entries are filtered out by https://github.com/singlecellopenproblems/SingleCellOpenProblems/blob/2787beb4bf76a9665980f97539faf011e5f5b0f1/workflow/parse_nextflow.py#L62

and when parse_metric_results(results) looks for the corresponding method runtime information, it's missing.

Now it's not clear why this line in execution_trace.txt doesn't have the correct information about the run info. My guess is this is a bug in the way Nextflow is caching the run. There are a total of three run_method entries (out of 55) that are missing info:

60  0b/22e77a   209d8fd6-a280-40ad-bd7f-f42ec7bf2e53    run_method (dimensionality_reduction:phate-citeseq_cbmc:openproblems-python-extras) CACHED  -   2021-03-22 19:13:10.988 -   -   -   -   -   -   -
216 78/469b2a   30cc27a8-63b4-48c9-a709-46b642a73ab7    run_method (multimodal_data_integration:mnn_log_scran_pooling-scicar_cell_lines:openproblems-r-extras)  CACHED  -   2021-03-22 17:29:20.681 -   -   -   -   -   -   -
219 d0/3ce0e1   0522968f-70d1-4e7d-a1ae-e0ef7a0686fe    run_method (multimodal_data_integration:mnn_log_cpm-scicar_cell_lines:openproblems-r-extras)    CACHED  -   2021-03-22 17:29:20.707 -   -   -   -   -   -   -

However, there are several more that are missing as well. I'm uploading the results directory from that run.

It's important to note that the metrics computed on these runs are also in the results/metrics directory from the run, suggesting that in fact the run_method process completed successfully.

In summary, the execution_trace.txt is missing about runs that ostensibly completed successfully.

Relevant files results.zip

LuckyMD commented 3 years ago

Scott recently fixed a bug like this on the method outputs in the past couple of days (#209). There was some inconsistency in how method outputs were interpreted. Might just be a further fix of the same type needed there?

dburkhardt commented 3 years ago

Well good news is this is currently passing on @olgabot's PR: https://github.com/singlecellopenproblems/SingleCellOpenProblems/actions/runs/684539619

I think I'll pull that one into master and we can address the caching later. Right now, her run-benchmarks job took 1hr40min which is only marginally longer than with caching