hillerlab / make_lastz_chains

Portable solution to generate genome alignment chains using lastz
MIT License
49 stars 8 forks source link

Lastz error: terminated with an error exit status (1) #70

Open Hannah1746 opened 2 weeks ago

Hannah1746 commented 2 weeks ago

Hey I have beed tryin to run make_chains.py locally and keep getting shutdown by terminated with an error exit status (1).

(TOGA) hwaterma@cast-bio540ws02:~/Documents/MX_TOGA$ ../make_lastz_chains/make_chains.py MX DR MX_HiC_50chr_debris.fasta Danio.fa --pd test_out --chaining_memory 20 --executor local -f

Make Lastz Chains

Version 2.0.8
Commit: 88f4f390f032015816eef6bf69c574d683046fea
Branch: main

Partition Step

Partitioning for target

Saving partitions and creating 26 buckets for lastz output In particular, 0 partitions for bigger chromosomes And 26 buckets for smaller scaffolds Saving target partitions to: /home/hwaterma/Documents/MX_TOGA/test_out/target_partitions.txt

Partitioning for query

Saving partitions and creating 59 buckets for lastz output In particular, 40 partitions for bigger chromosomes And 19 buckets for smaller scaffolds Saving query partitions to: /home/hwaterma/Documents/MX_TOGA/test_out/query_partitions.txt Num. target partitions: 0 Num. query partitions: 40 Num. lastz jobs: 0

Lastz Alignment Step

LASTZ: making jobs LASTZ: saved 1534 jobs to /home/hwaterma/Documents/MX_TOGA/test_out/temp_lastz_run/lastz_joblist.txt Parallel manager: pushing job /data/krablab/miniconda2/envs/TOGA/bin/nextflow /home/hwaterma/Documents/make_lastz_chains/parallelization/execute_joblist.nf --joblist /home/hwaterma/Documents/MX_TOGA/test_out/temp_lastz_run/lastz_joblist.txt -c /home/hwaterma/Documents/MX_TOGA/test_out/temp_lastz_run/lastz_config.nf

N E X T F L O W ~ version 24.10.0 Launching /home/hwaterma/Documents/make_lastz_chains/parallelization/execute_joblist.nf [deadly_ekeblad] DSL2 - revision: 0432e25129

executor > local (27) [e2/21114c] execute_jobs (34) [ 2%] 17 of 769, failed: 17, retries: 17 [80/3a5e0e] NOTE: Process execute_jobs (7) terminated with an error exit status (1) -- Execution is retried (1) executor > local (27)
[70/2f69ec] execute_jobs (12) [ 2%] 18 of 824, failed: 18, retries: 18
[80/3a5e0e] NOTE: Process execute_jobs (7) terminated with an error exit status (1) -- Execution is retried (1) [cb/61e398] NOTE: Process execute_jobs (26) terminated with an error exit status (1) -- Execution is retried (1) [0f/1cc334] NOTE: Process execute_jobs (39) terminated with an error exit status (1) -- Execution is retried (1) [7c/1aa0fe] NOTE: Process execute_jobs (16) terminated with an error exit status (1) -- Execution is retried (1) [aa/6ec182] NOTE: Process execute_jobs (20) terminated with an error exit status (1) -- Execution is retried (1) [b5/7cb776] NOTE: Process execute_jobs (21) terminated with an error exit status (1) -- Execution is retried (1) [71/1845b5] NOTE: Process execute_jobs (27) terminated with an error exit status (1) -- Execution is retried (1) [34/8005d1] NOTE: Process execute_jobs (25) terminated with an error exit status (1) -- Execution is retried (1) [df/64091e] NOTE: Process execute_jobs (22) terminated with an error exit status (1) -- Execution is retried (1) [f6/8f96ed] NOTE: Process execute_jobs (33) terminated with an error exit status (1) -- Execution is retried (1) [04/06b5b4] NOTE: Process execute_jobs (10) terminated with an error exit status (1) -- Execution is retried (1) [98/051c7e] NOTE: Process execute_jobs (5) terminated with an error exit status (1) -- Execution is retried (1) [09/e4de98] NOTE: Process execute_jobs (3) terminated with an error exit status (1) -- Execution is retried (1) [c2/153283] NOTE: Process execute_jobs (37) terminated with an error exit status (1) -- Execution is retried (1) [36/f2dec6] NOTE: Process execute_jobs (29) terminated with an error exit status (1) -- Execution is retried (1) [33/2420be] NOTE: Process execute_jobs (40) terminated with an error exit status (1) -- Execution is retried (1) [e2/21114c] NOTE: Process execute_jobs (34) terminated with an error exit status (1) -- Execution is retried (1)

When I try to understand the error the run.log:

Lastz Alignment Step

LASTZ: making jobs LASTZ: saved 1534 jobs to /home/hwaterma/Documents/MX_TOGA/test_out/temp_lastz_run/lastz_joblist.txt Parallel manager: pushing job /data/krablab/miniconda2/envs/TOGA/bin/nextflow /home/hwaterma/Documents/make_lastz_chains/parallelization/execute_joblist.nf --joblist /home/hwaterma/Documents/MX_TOGA/test_out/temp_lastz_run/lastz_joblist.txt -c /home/hwaterma/Documents/MX_TOGA/test_out/temp_lastz_run/lastz_config.nf

and there is no .nextflow or .nextflow.log get crated.

I beed trying to trouble shoot on my own but just can;'t seem to understand what the code it getting stuck on.

MichaelHiller commented 2 weeks ago

Could you pls try to run a single lastz jobs on the command line? There is something systematically wrong. Likely some input files are not found or so.

@kirilenkobm Could you pls have a look? There must be log files that indicate why the lastz jobs die.

Hannah1746 commented 2 weeks ago

I ran one of the jobs from lastz_joblist.txt:

/home/hwaterma/Documents/make_lastz_chains/standalone_scripts/run_lastz_intermediate_layer.py BULK_1:/home/hwaterma/Documents/MX_TOGA/test_out/target.2bit:MYX_Chr1 /home/hwaterma/Documents/MX_TOGA/test_out/query.2bit:NC_007112_7:0-50000000 /home/hwaterma/Documents/MX_TOGA/test_out/pipeline_parameters.json /home/hwaterma/Documents/MX_TOGA/test_out/temp_lastz_psl_output/bucket_ref_bulk_1/BULK_1_NC_007112_7__1.psl /home/hwaterma/Documents/make_lastz_chains/standalone_scripts/run_lastz.py --output_format psl --axt_to_psl /home/hwaterma/Documents/make_lastz_chains/HL_kent_binaries/axtToPsl

That has been running for a few hours now with no errors.

MichaelHiller commented 2 weeks ago

Well that is promising. You align a whole chr1 (how large is that??) against a 50Mb chunk. If you want to run the test faster, maybe align it to a 5 or 10 Mb chunk

Hannah1746 commented 2 weeks ago

Our first chromosome is 69,199,620 bp.

The command did run to completion too.

I just do not understand why the larger script is not working.

MichaelHiller commented 2 weeks ago

I have no idea. Could you pls check how much memory the job required and how long it ran? If you run /usr/bin/time -v lastz .... this gives you run time and max memory peak consumption.

Maybe your jobs get killed by Slurm on the cluster because they don't get enough memory or runtime?

Did you test running lastz on the actual compute nodes that run the Slurm jobs? Maybe something is not correctly configured there.

Do the jobs die immediately or only after running for a few hours?

Also, maybe test splitting the genomes into much smaller chunks to see if these jobs succeed.

Hannah1746 commented 1 week ago

hwaterma@cast-bio540ws02:~/Documents/MX_TOGA$ /usr/bin/time -v lastz

You must specify a target file
lastz-- Local Alignment Search Tool, blastZ-like
(version 1.04.15 released 20210827)
usage: lastz target [query] [options]
(common options; use --help for a more extensive list)
target, query specifiers or files, containing sequences to align
(use --help=files for more details)
--seed= set seed pattern (12of19, 14of22, or general pattern)
(default is 1110100110010101111)
--[no]transition allow (or don't) one transition in a seed hit
(by default a transition is allowed)
--[no]chain perform chaining
(by default no chaining is performed)
--[no]gapped perform gapped alignment (instead of gap-free)
(by default gapped alignment is performed)
--step= set step length (default is 1)
--strand=both search both strands
--strand=plus search + strand only (matching strand of query spec)
(by default both strands are searched) --scores= read substitution and gap scores from a file --xdrop= set x-drop threshold (default is 10sub[A][A]) --ydrop= set y-drop threshold (default is open+300extend) --infer[=] infer scores from the sequences, then use them all inference options are read from the control file --hspthresh= set threshold for high scoring pairs (default is 3000) ungapped extensions scoring lower are discarded

can also be a percentage or base count --gappedthresh= set threshold for gapped alignments gapped extensions scoring lower are discarded can also be a percentage or base count (default is to use same value as --hspthresh) --include= read command line arguments from a text file --help list "all" options (but the online documentation is more complete) --help=files list information about file specifiers --help=shortcuts list blastz-compatible shortcuts --help=defaults list scoring defaults for your current settings --help=yasra list yasra-specific shortcuts See the online documentation at http://www.bx.psu.edu/~rsharris/lastz for the most up-to-date information. Command exited with non-zero status 1 Command being timed: "lastz" User time (seconds): 0.00 System time (seconds): 0.00 Percent of CPU this job got: 33% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.00 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 2240 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 97 Voluntary context switches: 1 Involuntary context switches: 0 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 1 The job dies immediately on our local computer. This computer does not have a time window. I have got it to run locally on our computer cluster but it does not run in the 3 day window we are allowed. It does not run on Slurm but, I am working with the computer cluster IT for that issue.
MichaelHiller commented 1 week ago

After /usr/bin/time -v which is just the command to track time and mem consumption, pls provide the full lastz command with all parameters

Hannah1746 commented 6 days ago

I don't think I am doing this right. I'm sorry for being a little slow to this computer coding but this is what I ran and got: (TOGA) hwaterma@cast-bio540ws02:~/Documents/MX_TOGA$ /usr/bin/time -v lastz /home/hwaterma/Documents/MX_TOGA/test_out/target.2bit:MYX_Chr1 /home/hwaterma/Documents/MX_TOGA/test_out/query.2bit:NC_007112_7:0-50000000 FAILURE: fopen_or_die failed to open "/home/hwaterma/Documents/MX_TOGA/test_out/target.2bit:MYX_Chr1" for "rb" Command exited with non-zero status 1 Command being timed: "lastz /home/hwaterma/Documents/MX_TOGA/test_out/target.2bit:MYX_Chr1 /home/hwaterma/Documents/MX_TOGA/test_out/query.2bit:NC_007112_7:0-50000000" User time (seconds): 0.00 System time (seconds): 0.00 Percent of CPU this job got: 100% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.00 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 2880 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 227 Voluntary context switches: 1 Involuntary context switches: 0 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 1

It seems that it can not read the target file.

MichaelHiller commented 6 days ago

your command /usr/bin/time -v lastz etc is all correct, but it outputs an error FAILURE: fopen_or_die failed to open "/home/hwaterma/Documents/MX_TOGA/test_out/target.2bit:MYX_Chr1" for "rb"

which means the lastz command without measuring time and mem consumption (via /usr/bin/time -v ) should also crash immediately.

Is the lastz command exactly the command that was running for a long time?

Maybe try this to see if your /usr/bin/time -v works.

/usr/bin/time -v find . | wc Command being timed: "find ." User time (seconds): 0.00 System time (seconds): 0.00 Percent of CPU this job got: 1% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.53 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 3032 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 217 Voluntary context switches: 66 Involuntary context switches: 0 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0

Hannah1746 commented 6 days ago

when I run that I get: /usr/bin/time -v find . | wc Command being timed: "find ." User time (seconds): 0.00 System time (seconds): 0.00 Percent of CPU this job got: 1% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.53 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 3032 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 217 Voluntary context switches: 66 Involuntary context switches: 0 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 -bash: syntax error near unexpected token `('

MichaelHiller commented 6 days ago

Exit status: 0 means that this test works.

Where is this coming from? -bash: syntax error near unexpected token `('

Hannah1746 commented 6 days ago

when I run /usr/bin/time -v find . | wc Command being timed: "find ." User time (seconds): 0.00 System time (seconds): 0.00 Percent of CPU this job got: 1% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.53 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 3032 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 217 Voluntary context switches: 66 Involuntary context switches: 0 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0

the output is: -bash: syntax error near unexpected token `('

MichaelHiller commented 6 days ago

Really weird. Here is what I get. image

MichaelHiller commented 6 days ago

You have a proper linux system?

Hannah1746 commented 6 days ago

Oh I am sorry I was running the full command

here is just /usr/bin/time -v find . | wc (TOGA) hwaterma@cast-bio540ws02:~/Documents/MX_TOGA$ /usr/bin/time -v find . | wc Command being timed: "find ." User time (seconds): 0.04 System time (seconds): 0.11 Percent of CPU this job got: 36% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.43 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 2560 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 136 Voluntary context switches: 1299 Involuntary context switches: 1 Swaps: 0 File system inputs: 10816 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 4824 4824 316858

MichaelHiller commented 6 days ago

Well then everything works. Pls fix the FAILURE: fopen_or_die failed to open "/home/hwaterma/Documents/MX_TOGA/test_out/target.2bit:MYX_Chr1" for "rb" This indicates the file is not readable or accessible. Maybe check the permissions.

Afterwards, the lastz run should also run.

Hannah1746 commented 6 days ago

(TOGA) hwaterma@cast-bio540ws02:~/Documents/MX_TOGA/test_out$ ls -lh total 1.1G -rw-rw-r-- 1 hwaterma biograd 1.5K Nov 8 11:28 pipeline_parameters.json -rw-rw-r-- 1 hwaterma biograd 483M Nov 8 11:28 query.2bit -rw-rw-r-- 1 hwaterma biograd 41K Nov 8 11:28 query.chrom.sizes -rw-rw-r-- 1 hwaterma biograd 32K Nov 8 11:28 query_partitions.txt -rw-rw-r-- 1 hwaterma biograd 3.9K Nov 8 11:28 run.log -rw-rw-r-- 1 hwaterma biograd 217 Nov 8 11:28 steps.json -rw-rw-r-- 1 hwaterma biograd 559M Nov 8 11:28 target.2bit -rw-rw-r-- 1 hwaterma biograd 6.4K Nov 8 11:28 target.chrom.sizes -rw-rw-r-- 1 hwaterma biograd 5.6K Nov 8 11:28 target_partitions.txt drwxrwxr-x 5 hwaterma biograd 4.0K Nov 8 11:28 temp_chain_run drwxrwxr-x 2 hwaterma biograd 4.0K Nov 8 11:28 temp_concat_lastz_output drwxrwxr-x 4 hwaterma biograd 4.0K Nov 8 11:28 temp_fill_chain drwxrwxr-x 2 hwaterma biograd 4.0K Nov 8 11:28 temp_kent drwxrwxr-x 28 hwaterma biograd 4.0K Nov 8 11:28 temp_lastz_psl_output drwxrwxr-x 4 hwaterma biograd 4.0K Nov 8 11:29 temp_lastz_run

The file is made from the pipeline and is read and write able. Not too sure how to fix that.

MichaelHiller commented 6 days ago

no idea. Pls send me this folder gzipped for download somewhere and I'll run this lastz locally on my system. Lets see if that works

Hannah1746 commented 6 days ago

I tar the files I am using. Could I get an email to send it to?

Thank you so much again for your time on this!

MichaelHiller commented 6 days ago

The 2bit files will be too large for email. Pls also gzip it. Then maybe put it on google drive for me to download.

Hannah1746 commented 6 days ago

I uploaded it to my Google Drive. https://drive.google.com/file/d/1xBSjjERhK_b6zAp1VXCihHd4K8lJMBin/view?usp=sharing https://drive.google.com/file/d/1MQ5WHl9fim5bydc1yhqmQbb4K0Fhc961/view?usp=sharing

MichaelHiller commented 6 days ago

pls use my senckenberg email https://tbg.senckenberg.de/hillerlab/contact-2/

MichaelHiller commented 3 days ago

I downloaded the files, but I would need the exact command you are running. The files are Danio and not target.2bit.

Also lastz takes parameters.

MichaelHiller commented 3 days ago

I think I found the problem. While 48.4% of the danio assembly is lower case = softmasked, you don't have any masking for the query. twoBitToFa MX.2bit stdout | faSize stdin 2340801162 bases (65000 N's 2340736162 real 2340736162 upper 0 lower) in 384 sequences in 1 files

Pls run RepeatModeler2 on it, and use the resulting lib for repeatMasking. Then the lastz pipe will likely work.

Not (properly) masking is the #1 issue that people have when the pipe is not running smoothly :-)