marbl / verkko

Telomere-to-telomere assembly of accurate long reads (PacBio HiFi, Oxford Nanopore Duplex, HERRO corrected Oxford Nanopore Simplex) and Oxford Nanopore ultra-long reads.
304 stars 30 forks source link

Not running final consensus since no rukki paths are available! #283

Closed JohnUrban closed 2 months ago

JohnUrban commented 2 months ago

Hi all,

Thanks for all the tools and support over the years.

I recently installed Verkko 2.2 with conda.

I am trying to get it to run in both --grid and --local modes on a SLURM cluster.

So far neither have finished.

The local mode was throwing this error:

verkko --local --local-memory 275 -d asm --hifi ${HIFI} --nano ${ULONT} --hic1 ${HICR1} --hic2 ${HICR2}

Launching release v2.2
Using snakemake 7.32.4.
Building DAG of jobs...
LockException:
Error: Directory cannot be locked. Please make sure that no other Snakemake process is trying to create the same files in the following directory:
/central/groups/carnegie_poc/jurban/data/hydra/assemblies/verkko/bortvin-only/01-default/nogrid/asm
If you are sure that no other instances of snakemake are running on this directory, the remaining lock was likely caused by a kill signal or a power loss. It can be removed with the --unlock argument.
ERROR!, HiC/Pore-C phasing failed, look above for error message.
Not running final consensus since no rukki paths are available!
Verkko logging for failed commands: 

I then found out about --snakeopts "--unlock" on a different issue thread, so ran it with that, and I got this:

verkko --snakeopts "--unlock" --local --local-memory 275 -d asm --hifi ${HIFI} --nano ${ULONT} --hic1 ${HICR1} --hic2 ${HICR2}

Launching release v2.2
Using snakemake 7.32.4.
Unlocking working directory.
Not running final consensus since no rukki paths are available!

The grid mode was failing with the same message about rukki. I tried relaunching a few times, most recently with --snakeopts touch which I read about on a another issue thread, but it seems to be throwing the same errors. I will attach the first 2 SLURM output files in case they are helpful (too big to copy paste). 01-slurm-43666930.out.txt 02-slurm-43684559.out.txt

Notably the rukki message is in both:

01-slurm-43666930.out.txt:  Not running final consensus since no rukki paths are available!
02-slurm-43684559.out.txt:  Not running final consensus since no rukki paths are available!

But also a lot of:

Error in rule buildGraph:
...
...
Error executing rule buildGraph on cluster (jobid: 7, external: 43674725, jobscript: /central/groups/carnegie_poc/jurban/data/hydra/assemblies/verkko/bortvin-only/01-default/grid/asm/.snakemake/tmp.30kjyq3h/verkko.buildGraph.7.sh). For error details see the cluster log and the log files of the involved rule(s).
...
Error in rule buildPackages:
...
Error executing rule buildPackages on cluster (jobid: 20, external: 43676600, jobscript: /central/groups/carnegie_poc/jurban/data/hydra/assemblies/verkko/bortvin-only/01-default/grid/asm/.snakemake/tmp.30kjyq3h/verkko.buildPackages.20.sh). For error details see th
e cluster log and the log files of the involved rule(s).

Any guidance on how to overcome would be appreciated.

Is the rukki problem as simple as installing rukki ( https://github.com/marbl/rukki )?

I can confirm that there is no "rukki" in the conda environment -- e.g. trying to do tab completion.

And I could not find any files with rukki in their name in /path/to/anaconda3/envs/verkko/.

Very naive question -- how does one install rukki?

Many thanks as always.

Best,

John

EDITED: added "no" to "there is rukki in the conda environment" --> "there is no rukki in the conda environment"

skoren commented 2 months ago

There isn't a problem with rukki in this run, it's reporting that it can't generate the final consensus because the initial scaffolding/consensus steps failed. I expect rukki is installed in verkko/lib/bin which isn't in the path but verkko knows where to find it. I'm not sure how the folder ended up locked, that usually only happens if you kill the snakemake script or the node it's running on fails, otherwise it cleans up after a failure. Either that or the local run was launched while the slurm run was still going.

It looks like the buildPackages (which loads all the reads into memory) is getting Killed on your cluster. Requesting more memory explicitly should help, try adding --par-run 1 100 24, that should request 100gb for the same job and re-launch and see if it completes then.

JohnUrban commented 2 months ago

Ok - just getting back to this today.

skoren commented 2 months ago

Sounds good, re "-- I had both the grid run and local run going at the same time, but they were on different nodes..." do you mean different assemblies? They can't share the same -d folder otherwise it should be fine to run multiple runs.

JohnUrban commented 2 months ago

Different assemblies, yes; not the same -d folder. So then that wouldn't be why the lock was happening. Still a mystery I guess.

JohnUrban commented 2 months ago

Ok. I launched yesterday. Today it seems done -- but I am looking for confirmation that it finished correctly. The question arises because the end of the SLURM output had a "cp" error. To help answer whether it finished correctly, I am also sharing the entire SLURM output, the contents of the -d asm directory, and some size statistics for the various assembly*fasta files therein.

This was the end of the SLURM output file:

[Thu Sep 12 02:24:54 2024]
Finished job 0.
30 of 30 steps (100%) done
Complete log: .snakemake/log/2024-09-12T005303.958071.snakemake.log
cp: cannot stat '*.bam': No such file or directory

Here is the whole thing in case it helps: slurm-43744733.out.txt

This is what the -d asm directory looks like:

ls -lh asm/
total 17G
drwxr-sr-x. 6 jurban hpc_carnegie_poc 4.0K Sep 11 11:45 0-correction
drwxr-sr-x. 2 jurban hpc_carnegie_poc 4.0K Sep 11 19:31 1-buildGraph
drwxr-sr-x. 2 jurban hpc_carnegie_poc 4.0K Sep 11 21:25 2-processGraph
drwxr-sr-x. 3 jurban hpc_carnegie_poc  16K Sep 11 20:14 3-align
drwxr-sr-x. 3 jurban hpc_carnegie_poc  16K Sep 11 21:05 3-alignTips
drwxr-sr-x. 2 jurban hpc_carnegie_poc  16K Sep 11 21:25 4-processONT
drwxr-sr-x. 2 jurban hpc_carnegie_poc 4.0K Sep 11 21:26 5-untip
drwxr-sr-x. 2 jurban hpc_carnegie_poc 4.0K Sep 11 21:37 6-layoutContigs
drwxr-sr-x. 3 jurban hpc_carnegie_poc 4.0K Sep 11 22:48 7-consensus
drwxr-sr-x. 4 jurban hpc_carnegie_poc  16K Sep 12 00:52 8-hicPipeline
-rw-r--r--. 1 jurban hpc_carnegie_poc 5.0K Sep 12 02:25 assembly.colors.csv
-rw-r--r--. 1 jurban hpc_carnegie_poc  32M Sep 12 02:24 assembly.disconnected.fasta
-rw-r--r--. 1 jurban hpc_carnegie_poc 596M Sep 12 02:24 assembly.fasta
-rw-r--r--. 1 jurban hpc_carnegie_poc 260M Sep 12 02:24 assembly.haplotype1.fasta
-rw-r--r--. 1 jurban hpc_carnegie_poc 276M Sep 12 02:25 assembly.haplotype2.fasta
-rw-r--r--. 1 jurban hpc_carnegie_poc 384M Sep 12 02:25 assembly.homopolymer-compressed.gfa
-rw-r--r--. 1 jurban hpc_carnegie_poc 336M Sep 12 02:25 assembly.homopolymer-compressed.layout
-rw-r--r--. 1 jurban hpc_carnegie_poc 364K Sep 11 22:48 assembly.homopolymer-compressed.noseq.gfa
-rw-r--r--. 1 jurban hpc_carnegie_poc 134K Sep 12 02:25 assembly.paths.tsv
-rw-r--r--. 1 jurban hpc_carnegie_poc 194K Sep 12 02:25 assembly.scfmap
-rw-r--r--. 1 jurban hpc_carnegie_poc  61M Sep 12 02:25 assembly.unassigned.fasta
drwxr-sr-x. 2 jurban hpc_carnegie_poc  64K Sep 12 00:35 batch-scripts
-rw-r--r--. 1 jurban hpc_carnegie_poc    0 Sep 11 07:33 emptyfile
-rw-r--r--. 1 jurban hpc_carnegie_poc  15G Sep 11 11:45 hifi-corrected.fasta.gz
-rwxr-xr-x. 1 jurban hpc_carnegie_poc  602 Sep 11 07:33 snakemake.sh
-rw-r--r--. 1 jurban hpc_carnegie_poc 5.0K Sep 11 07:33 verkko.yml

assembly.fasta stats:

Number contigs: 1584
Assembly size: 624039213.0
Max contig size: 28843277.0
Min contig size: 6640.0
Mean contig size: 393964.1496212121
Median contig size: 36883.0
Contig N50.0    17684262
Contig L50.0    15
E size (G=624039213) = 15634696

assembly.haplotype1.fasta stats:

Number contigs: 229
Assembly size: 272082509.0
Max contig size: 25414194.0
Min contig size: 7728.0
Mean contig size: 1188133.227074236
Median contig size: 34347.0
Contig N50.0    18070157
Contig L50.0    7
E size (G=272082509) = 16627626

assembly.haplotype2.fasta stats:

Number contigs: 319
Assembly size: 288543443.0
Max contig size: 28843277.0
Min contig size: 6640.0
Mean contig size: 904524.8996865203
Median contig size: 37499.0
Contig N50.0    17637255
Contig L50.0    7
E size (G=288543443) = 18073854

assembly.disconnected.fasta stats:

Number contigs: 1558
Assembly size: 32946615.0
Max contig size: 94225.0
Min contig size: 6249.0
Mean contig size: 21146.73620025674
Median contig size: 20172.5
Contig N50.0    23127
Contig L50.0    556
E size (G=32946615) = 24244

assembly.unassigned.fasta stats:

Number contigs: 1036
Assembly size: 63413261.0
Max contig size: 2924467.0
Min contig size: 6799.0
Mean contig size: 61209.71138996139
Median contig size: 37037.5
Contig N50.0    85678
Contig L50.0    158
E size (G=63413261) = 275735

Btw - I am not only consistently impressed with your assembly tools over the years, but also with your "customer service/support". It is top notch. Thank you a thousand times.

skoren commented 2 months ago

Yes, this run looks like it has completed successfully, the cp error is because you weren't generating assembly alignments so it's not going to affect your result (though it shouldn't report an error, I'll try to reproduce that locally).

JohnUrban commented 2 months ago

Thanks for getting back so fast. Okay - great that it is done.

I will also note that I was told to expect a 300 Mb across 15 chromosomes.

If I were to use something like YaHS to further Hi-C scaffold, would you recommend using the haplotype assemblies separately?

And is it default to expect "haplotype 1" to be better in some way (e.g. higher BUSCO completeness) than haplotype 2? (I guess I am still thinking about it as "primary" vs "associated" like in the days of old.)

I am assuming at the moment that when the researchers inevitably ask, "So which file should we use as our reference genome?", that I could tell them to just use assembly.haplotype1.fasta for most of the standard needs -- e.g. ChIP-seq analyses. Is that acceptable? Seems like forward-looking would be to use some GFA approach, but for now they will be using the more traditional single reference genome approaches.

skoren commented 2 months ago

The latest version of verkko does Hi-C scaffolding already (though we've mostly tested it on humans) so I am not sure YAHS would do much beyond what you have. Your L50 is already 7 for 15 chromosomes. There's also no guarantee that hap1 is only thing from the same haplotype when the sequences are from different chromosomes (e.g. chr1 can be one hap and chr2 another hap). So I don't think you want to scaffold further.

As for one being "better", no the sorting into haplotype 1 or 2 with HiC is random. What you could do is look at each chromosome in both haplotypes and select the better one (fewer gaps, higher QV, etc) to make a primary assembly that is most continuous.

JohnUrban commented 2 months ago

Thanks.

What should I do with assembly.unassigned.fasta?

It looks like assembly.fasta (1584) = assembly.haplotype1.fasta (229) + assembly.haplotype2.fasta (319) + assembly.unassigned.fasta (1036).

Those I assume could not be assigned to a haplotype, but may be best included with either haplotype, or included with the "primary" after picking the best representatives from each haplotype (as you suggested).

Is is common to include the unassigned in a "final" reference? The longest scaffold therein is 2.9 Mb, so it seems non-trivial.

skoren commented 2 months ago

They're usually shorter or repetitive regions that couldn't be assigned to a haplotype. It's possible adding them to the primary haplotype will introduce some gene redundancy since it could be sequence from hap2 you've now added to hap1 for example. I'd probably exclude anything shorter than some minimum like 100kb or 500kb. You could look at the assembly graph + colors.csv and use the scfmap and paths.tsv files to translate the name to graph nodes and see if you can confirm it belongs to one haplotype or the other or both (by coverage, location in graph connected to only one color or both).

skoren commented 2 months ago

Given that you have an assembly, I think this is resolved? Please re-open if I'm incorrect.

JohnUrban commented 2 months ago

You're right. The original issue is for sure resolved. Thanks again.