hillerlab / make_lastz_chains

Portable solution to generate genome alignment chains using lastz
MIT License
44 stars 8 forks source link

Fatal non-specific error if genome names are the same as genome filenames #41

Open ning-y opened 10 months ago

ning-y commented 10 months ago

If the target_name/query_name has the same value as the basename of target_genome/query_genome, then the pipeline fails with the "no non-empty files" non-specific error:

### Concatenating Lastz Results (Cat) Step ###

Concatenating LASTZ output from 1 buckets
​* skip bucket bucket_ref__chrX_in_0_156040895: nothing to concat
An error occurred while executing cat: Error! No non-empty files found at ~​/repos/toga-pipeline/inter/300-chain/hg38-chrX.2bit/mm10-chrX.2bit/out/temp_concat_lastz_output. The failed operation label is: cat_step
Traceback (most recent call last):
  File "~​/repos/toga-pipeline/inter/100-install/make_lastz_chains/modules/step_manager.py", line 70, in execute_steps
    step_result = step_to_function[step](params, project_paths, step_executables)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/repos/toga-pipeline/inter/100-install/make_lastz_chains/modules/pipeline_steps.py", line 58, in cat_step
    do_cat(params, project_paths, executables)
  File "~/repos/toga-pipeline/inter/100-install/make_lastz_chains/steps_implementations/cat_step.py", line 51, in do_cat
    has_non_empty_file(project_paths.cat_out_dirname, "cat_step")
  File "~/repos/toga-pipeline/inter/100-install/make_lastz_chains/modules/common.py", line 51, in has_non_empty_file
    raise PipelineFileNotFoundError(err_msg)
modules.error_classes.PipelineFileNotFoundError: Error! No non-empty files found at ~/repos/toga-pipeline/inter/300-chain/hg38-chrX.2bit/mm10-chrX.2bit/out/temp_concat_lastz_output. The failed operation label is: cat_step

In fact, the initial error seems to be from Lastz. Tracing the chains_joblist commands, I see the command,

~/repos/toga-pipeline/inter/100-install/make_lastz_chains/standalone_scripts/run_lastz_intermediate_layer.py ~/repos/toga-pipeline/inter/300-chain/hg38-chrX.2bit/mm10-chrX.2bit/out/target.2bit:chrX:0-156040895 ~/repos/toga-pipeline/inter/300-chain/hg38-chrX.2bit/mm10-chrX.2bit/out/query.2bit:chrX:0-50000000 ~/repos/toga-pipeline/inter/300-chain/hg38-chrX.2bit/mm10-chrX.2bit/out/pipeline_parameters.json ~/repos/toga-pipeline/inter/300-chain/hg38-chrX.2bit/mm10-chrX.2bit/out/temp_lastz_psl_output/bucket_ref__chrX_in_0_156040895/chrX_chrX__1.psl ~/repos/toga-pipeline/inter/100-install/make_lastz_chains/standalone_scripts/run_lastz.py --output_format psl --axt_to_psl ~/repos/toga-pipeline/.snakemake/conda/d1516d7b42018368bb384d7b98ada0c3_/bin/axtToPsl

Which, when run, gives a traceback point to Lastz:

Traceback (most recent call last):
  File "~/repos/toga-pipeline/inter/100-install/make_lastz_chains/standalone_scripts/run_lastz.py", line 197, in call_lastz
    lastz_out = subprocess.check_output(cmd, shell=True, stderr=subprocess.PIPE).decode("utf-8")
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/repos/toga-pipeline/.snakemake/conda/d1516d7b42018368bb384d7b98ada0c3_/lib/python3.12/subprocess.py", line 466, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/repos/toga-pipeline/.snakemake/conda/d1516d7b42018368bb384d7b98ada0c3_/lib/python3.12/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'lastz "~/repos/toga-pipeline/inter/300-chain/hg38-chrX.2bit/mm10-chrX.2bit/out/target.2bit/chrX[1,156040895][multiple]" "~/repos/toga-pipeline/inter/300-chain/hg38-chrX.2bit/mm10-chrX.2bit/out/query.2bit/chrX[1,50000000][multiple]" Y=9400 H=2000 L=3000 K=2400 --traceback=800.0M --format=axt+' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "~/repos/toga-pipeline/inter/100-install/make_lastz_chains/standalone_scripts/run_lastz.py", line 336, in <module>
    main()
  File "~/repos/toga-pipeline/inter/100-install/make_lastz_chains/standalone_scripts/run_lastz.py", line 317, in main
    lastz_output = call_lastz(cmd)
                   ^^^^^^^^^^^^^^^
  File "~/repos/toga-pipeline/inter/100-install/make_lastz_chains/standalone_scripts/run_lastz.py", line 201, in call_lastz
    raise LastzProcessError(
LastzProcessError: Lastz command failed with exit code 1. Error message: FAILURE: fopen_or_die failed to open "~/repos/toga-pipeline/inter/300-chain/hg38-chrX.2bit/mm10-chrX.2bit/out/target.2bit/chrX" for "rb"

When I checked ~/repos/toga-pipeline/inter/300-chain/hg38-chrX.2bit/mm10-chrX.2bit/out/target.2bit/chrX, I found that ~/repos/toga-pipeline/inter/300-chain/hg38-chrX.2bit/mm10-chrX.2bit/out/target.2bit is a soft-link to a file, which is probably why Lastz failed, because it thinks that is a directory (?).

When I changed the target_name/query_name such that it was different from the basename of the target_genome/query_genome files (and deleted all the intermediate files), this error resolved. I regret that I do not have time to investigate further and provide a more detailed report.

kirilenkobm commented 10 months ago

Hi @ning-y

thank you a lot for reporting this! Don't worry about the detail level, it's already super helpful.