PBSpro support - Githubissues

mictadlo commented 4 months ago

Hi, Our HPC uses PBSpro. Does make_lastz_chains support PBSpro?

Best wishes,

MIchal

MichaelHiller commented 4 months ago

Likely not, but my understanding is that we use NextFlow to schedule the jobs. So if NextFlow can communicate with PBSpro, it may work.

mictadlo commented 4 months ago

According to the Nextflow documentation PBSpro is supported. However, I failed to get it running in the following way:

> ./make_chains.py target query test_data/test_reference.fa test_data/test_query.fa --pd test_out -f --chaining_memory 16 --cluster_executor pbspro --cluster_queue test
# Make Lastz Chains #
Version 2.0.8
Commit: 187e313afc10382fe44c96e47f27c4466d63e114
Branch: main

* found run_lastz.py at /mnt/hpccs01/work/waterhouse_team/apps/make_lastz_chains/standalone_scripts/run_lastz.py
* found run_lastz_intermediate_layer.py at /mnt/hpccs01/work/waterhouse_team/apps/make_lastz_chains/standalone_scripts/run_lastz_intermediate_layer.py
* found chain_gap_filler.py at /mnt/hpccs01/work/waterhouse_team/apps/make_lastz_chains/standalone_scripts/chain_gap_filler.py
* found faToTwoBit at /mnt/hpccs01/work/waterhouse_team/apps/make_lastz_chains/HL_kent_binaries/faToTwoBit
* found twoBitToFa at /mnt/hpccs01/work/waterhouse_team/apps/make_lastz_chains/HL_kent_binaries/twoBitToFa
* found pslSortAcc at /mnt/hpccs01/work/waterhouse_team/apps/make_lastz_chains/HL_kent_binaries/pslSortAcc
* found axtChain at /mnt/hpccs01/work/waterhouse_team/apps/make_lastz_chains/HL_kent_binaries/axtChain
* found axtToPsl at /mnt/hpccs01/work/waterhouse_team/apps/make_lastz_chains/HL_kent_binaries/axtToPsl
* found chainAntiRepeat at /mnt/hpccs01/work/waterhouse_team/apps/make_lastz_chains/HL_kent_binaries/chainAntiRepeat
* found chainMergeSort at /mnt/hpccs01/work/waterhouse_team/apps/make_lastz_chains/HL_kent_binaries/chainMergeSort
* found chainCleaner at /mnt/hpccs01/work/waterhouse_team/apps/make_lastz_chains/HL_kent_binaries/chainCleaner
* found chainSort at /mnt/hpccs01/work/waterhouse_team/apps/make_lastz_chains/HL_kent_binaries/chainSort
* found chainScore at /mnt/hpccs01/work/waterhouse_team/apps/make_lastz_chains/HL_kent_binaries/chainScore
* found chainNet at /mnt/hpccs01/work/waterhouse_team/apps/make_lastz_chains/HL_kent_binaries/chainNet
* found chainFilter at /mnt/hpccs01/work/waterhouse_team/apps/make_lastz_chains/HL_kent_binaries/chainFilter
* found lastz at /work/waterhouse_team/miniconda2/envs/makeLastzChains/bin/lastz
* found nextflow at /home/lorencm/bin/nextflow
All necessary executables found.
Making chains for test_data/test_reference.fa and test_data/test_query.fa files, saving results to /mnt/hpccs01/work/waterhouse_team/apps/make_lastz_chains/test_out
Pipeline started at 2024-05-10 08:46:21.499906
* Setting up genome sequences for target
genomeID: target
input sequence file: test_data/test_reference.fa
is 2bit: False
planned genome dir location: /mnt/hpccs01/work/waterhouse_team/apps/make_lastz_chains/test_out/target.2bit
Initial fasta file test_data/test_reference.fa saved to /mnt/hpccs01/work/waterhouse_team/apps/make_lastz_chains/test_out/target.2bit
For target (target) sequence file: /mnt/hpccs01/work/waterhouse_team/apps/make_lastz_chains/test_out/target.2bit; chrom sizes saved to: /mnt/hpccs01/work/waterhouse_team/apps/make_lastz_chains/test_out/target.chrom.sizes
* Setting up genome sequences for query
genomeID: query
input sequence file: test_data/test_query.fa
is 2bit: False
planned genome dir location: /mnt/hpccs01/work/waterhouse_team/apps/make_lastz_chains/test_out/query.2bit
Initial fasta file test_data/test_query.fa saved to /mnt/hpccs01/work/waterhouse_team/apps/make_lastz_chains/test_out/query.2bit
For query (query) sequence file: /mnt/hpccs01/work/waterhouse_team/apps/make_lastz_chains/test_out/query.2bit; chrom sizes saved to: /mnt/hpccs01/work/waterhouse_team/apps/make_lastz_chains/test_out/query.chrom.sizes

### Partition Step ###

# Partitioning for target
Saving partitions and creating 1 buckets for lastz output
In particular, 0 partitions for bigger chromosomes
And 1 buckets for smaller scaffolds
Saving target partitions to: /mnt/hpccs01/work/waterhouse_team/apps/make_lastz_chains/test_out/target_partitions.txt
# Partitioning for query
Saving partitions and creating 1 buckets for lastz output
In particular, 0 partitions for bigger chromosomes
And 1 buckets for smaller scaffolds
Saving query partitions to: /mnt/hpccs01/work/waterhouse_team/apps/make_lastz_chains/test_out/query_partitions.txt
Num. target partitions: 0
Num. query partitions: 0
Num. lastz jobs: 0

### Lastz Alignment Step ###

LASTZ: making jobs
LASTZ: saved 1 jobs to /mnt/hpccs01/work/waterhouse_team/apps/make_lastz_chains/test_out/temp_lastz_run/lastz_joblist.txt
Parallel manager: pushing job /home/lorencm/bin/nextflow /mnt/hpccs01/work/waterhouse_team/apps/make_lastz_chains/parallelization/execute_joblist.nf --joblist /mnt/hpccs01/work/waterhouse_team/apps/make_lastz_chains/test_out/temp_lastz_run/lastz_joblist.txt -c /mnt/hpccs01/work/waterhouse_team/apps/make_lastz_chains/test_out/temp_lastz_run/lastz_config.nf
N E X T F L O W  ~  version 23.10.1
Launching `/mnt/hpccs01/work/waterhouse_team/apps/make_lastz_chains/parallelization/execute_joblist.nf` [gigantic_lorenz] DSL2 - revision: 0483b29723
[12/2b01f3] process > execute_jobs (1) [100%] 4 of 4, failed: 4, retries: 3
[1c/6ff42d] NOTE: Error submitting process 'execute_jobs (1)' for execution -- Execution is retried (1)
[46/bab438] NOTE: Error submitting process 'execute_jobs (1)' for execution -- Execution is retried (2)
[4b/caee10] NOTE: Error submitting process 'execute_jobs (1)' for execution -- Execution is retried (3)
ERROR ~ Error executing process > 'execute_jobs (1)'

Caused by:
  Failed to submit process to grid scheduler for execution

Command executed:

  qsub -N nf-execute_jobs .command.run

Command exit status:
  159

Command output:
  qsub: Unauthorized Request 

Work dir:
  /mnt/hpccs01/work/waterhouse_team/apps/make_lastz_chains/test_out/temp_lastz_run/work/12/2b01f39c7ef951786a32513d22ccc9

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

 -- Check '.nextflow.log' file for details

### Error! The nextflow process lastz crashed!
Please look at the logs in the /mnt/hpccs01/work/waterhouse_team/apps/make_lastz_chains/test_out/temp_lastz_run
An error occurred while executing lastz: Jobs for lastz at /mnt/hpccs01/work/waterhouse_team/apps/make_lastz_chains/test_out/temp_lastz_run/lastz_joblist.txt died
Traceback (most recent call last):
  File "/mnt/hpccs01/work/waterhouse_team/apps/make_lastz_chains/modules/step_manager.py", line 70, in execute_steps
    step_result = step_to_function[step](params, project_paths, step_executables)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/hpccs01/work/waterhouse_team/apps/make_lastz_chains/modules/pipeline_steps.py", line 52, in lastz_step
    do_lastz(params, project_paths,  executables)
  File "/mnt/hpccs01/work/waterhouse_team/apps/make_lastz_chains/steps_implementations/lastz_step.py", line 99, in do_lastz
    execute_nextflow_step(
  File "/mnt/hpccs01/work/waterhouse_team/apps/make_lastz_chains/parallelization/nextflow_wrapper.py", line 157, in execute_nextflow_step
    nextflow_manager.check_failed()
  File "/mnt/hpccs01/work/waterhouse_team/apps/make_lastz_chains/parallelization/nextflow_wrapper.py", line 109, in check_failed
    raise NextflowProcessError(f"Jobs for {self.label} at {self.joblist_path} died")
modules.error_classes.NextflowProcessError: Jobs for lastz at /mnt/hpccs01/work/waterhouse_team/apps/make_lastz_chains/test_out/temp_lastz_run/lastz_joblist.txt died

> less test_out/.nextflow.log
test_out/.nextflow.log: No such file or directory

What did I do wrong?

Best wishes,

Michal

MichaelHiller commented 4 months ago

Sorry, I don't know. I have 0 experience with PBSpro.

ohdongha commented 4 months ago

Hi! Sorry for the hitchhiking.

I also had trouble running make_lastz_chains on an HPC that runs PBS, likely due to some internal configuration of the HPC. After trial and error, I ended up running make_lastz_chains (the original v.1.0.0) by submitting the entire job to a single computing node with multiple (N) cores in the HPC, with --executor local --executor_queuesize $N (--executor local can be omitted since that's the default).

In my case, a node with N=32 was good enough for the alignment of mammalian-size genomes (or any genomes <16Gb), and there are steps where RAM appears to matter more than the number of threads.

If you have some computing nodes with a reasonable number of cores, perhaps this approach would work?

Cheers, Dong-Ha

MichaelHiller commented 4 months ago

Thanks for the feedback. Of course running it on a single node may work. These days CPUs have 128 or 192 cores. It will take a few days to finish though.

Maybe @kirilenkobm has insights in PBSpro or how to fix the problem?

mictadlo commented 3 months ago

Hi @ohdongha, How much memory did you need for your mammalian-size genomes? I want to run it on a 3GB allotetraploid plant.

Best wishes,

Michal

ohdongha commented 3 months ago

Hi @ohdongha, How much memory did you need for your mammalian-size genomes? I want to run it on a 3GB allotetraploid plant.

I typically ask for 360 GB and 32-core, to be on the safe side. In most cases, max_vmem does not exceed 200GB. I think the key, which @MichaelHiller also always emphasizes, is to soft-mask the repeats as much as possible.

Cheers, Dong-Ha

hillerlab / make_lastz_chains

PBSpro support #60