marbl / verkko

Telomere-to-telomere assembly of accurate long reads (PacBio HiFi, Oxford Nanopore Duplex, HERRO corrected Oxford Nanopore Simplex) and Oxford Nanopore ultra-long reads.
294 stars 29 forks source link

Not running final consensus since no rukki paths provided! #162

Closed chunlinxiao closed 1 year ago

chunlinxiao commented 1 year ago

Just installed the latest v1.4 using conda (with no installation issue), and run verkko with --hic1/--hic2 option, but encountered error like below:

5-untip/nodecov_hifi_fix.csv
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-06-30T082326.215214.snakemake.log
ERROR!
Not running final consensus since no rukki paths provided!

I added rukki (~/miniconda3/lib/verkko/bin/rukki) to PATH and re-run, but still failed with same error.

>verkko --version
bioconda verkko bioconda 1.4

>rukki
extraction of paths from assembly graphs
Usage: rukki <COMMAND>

Any suggestion?

Thanks

skoren commented 1 year ago

Can you post the error log from 5-untip/untip.err?

chunlinxiao commented 1 year ago

Here it is:

cat 5-untip/untip.err
UntipRelative Unitigify 3 Combine mappings Combine edges Find lengths Fix coverage Pop bubbles based on coverage Unroll simple loops Unitigify 3b Traceback (most recent call last): File ".../miniconda3/lib/verkko/scripts/unitigify.py", line 101, in if node not in belongs_to_unitig: start_unitig(">" + node, unitigs, belongs_to_unitig, edges) File ".../miniconda3/lib/verkko/scripts/unitigify.py", line 33, in start_unitig while len(edges[new_unitig[-1]]) == 1 and getone(edges[new_unitig[-1]])[1:] != new_unitig[-1][1:]: KeyError: '>utig3a-98885'

skoren commented 1 year ago

Haven't seen that before. Are you able to share the full 5-untip folder? You can see the FAQ here on how to send us data: https://canu.readthedocs.io/en/latest/faq.html#how-can-i-send-data-to-you. I may need some other folders but lets start with that one first.

chunlinxiao commented 1 year ago

the file "5-untip.tar.gz" was uploaded as suggested - thanks!

Zero-Sun commented 1 year ago

I also encountered the same error. I just installed rukki, looking forward to a solution!

skoren commented 1 year ago

This should be fixed by the commit (which auto-closed the issue). You should be able to just replace the script in your conda install and re-run. However, your assembly graph looked quite fragmented. I doubt the Hi-C phasing, at least with default parameters, will work well here. What type of input data do you have for your genome?

chunlinxiao commented 1 year ago

Thanks @skoren - the data used for verkko testing (hifi + ONT + HiC) were all from GIAB HG002.

Zero-Sun commented 1 year ago

After I reinstalled rukki, the error was exactly the same as before. This is my 5-untip/untip.err message. Excuse me, is there any better solution?

UntipRelative
Unitigify 3
Combine mappings
Combine edges
Find lengths
Fix coverage
Pop bubbles based on coverage
Unroll simple loops
Unitigify 3b
Combine mappings
Combine edges
Find lengths
Fix coverage
Unroll simple loops round 2
Unitigify 4
skoren commented 1 year ago

The error doesn't look the same, there's no error message in the untip.err file. How did you re-install? The fix isn't part of a release, you just need to patch the one python script. What was the full log of the run that didn't finish?

chunlinxiao commented 1 year ago

For my latest testing, the previous 5-unitig error was gone, but now stopped at 8-hicPipeline.

Looks like this related to run_mashmap.err:

    log: 8-hicPipeline/transform_bwa.err
    jobid: 150
    reason: Missing output files: 8-hicPipeline/hic_mapping.byread.output; Input files updated by another job: 8-hicPipeline/hic_to_assembly.sorted_by_read.bam
    threads: 8
    resources: tmpdir=/tmp, job_id=1, n_cpus=8, mem_gb=16, time_h=24

[Tue Jul  4 21:20:36 2023]
Finished job 150.
61 of 64 steps (95%) done
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-07-03T095104.968992.snakemake.log
ERROR!
Not running final consensus since no rukki paths provided!
8-hicPipeline/run_mashmap.err
mashmap: error while loading shared libraries: libmkl_rt.so.2: cannot open shared object file: No such file or directory
skoren commented 1 year ago

Yes, that's a mashmap issue with a missing math library. Is mashmap also installed from conda? Could you check mashmap --version?

chunlinxiao commented 1 year ago

I believe it was installed through conda verkko installation ( I did not install mashmap specifically).

which mashmap ~/miniconda3/bin/mashmap

mashmap --version mashmap: error while loading shared libraries: libmkl_rt.so.2: cannot open shared object file: No such file or directory

skoren commented 1 year ago

Conda has been a pain recently, in this case it seems it installed an invalid version of mashmap. I'm not even sure why it's got the dependency, that's not a mashmap-included library. Not something we can fix within verkko since we're just relying on conda to solve the environment correctly and install working tools. Try uninstalling or updating the latest mashmap version, 3.0.5 and see if it runs then.

skoren commented 1 year ago

I confirmed (see linked issue) that this shouldn't be a dependency with mashmap but may have been an issue with mashmap v3.0.4 installations in conda. So updating as I suggested above should fix your issue.

chunlinxiao commented 1 year ago

just installed mashmap (3.0.6) and re-run - but got the following errors:

[Mon Jul 10 14:42:15 2023]
Error in rule hicPhasing:
    jobid: 177
    input: 8-hicPipeline/unitigs.matches, 8-hicPipeline/hic_mapping.byread.output, 8-hicPipeline/unitigs.hpc.noseq.gfa
    output: 8-hicPipeline/hic.byread.compressed, 8-hicPipeline/hicverkko.colors.tsv
    log: 8-hicPipeline/hic_phasing.err (check log file(s) for error details)
    shell:

cd 8-hicPipeline

cat > ./hic_phasing.sh <<EOF
#!/bin/sh
set -e
 ~/miniconda3/lib/verkko/scripts/hicverkko.py False False .
EOF

chmod +x ./hic_phasing.sh

./hic_phasing.sh > ../8-hicPipeline/hic_phasing.err 2>&1

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-07-10T143726.119298.snakemake.log
ERROR!
tail 8-hicPipeline/hic_phasing.err
Traceback (most recent call last):
  File "~/miniconda3/lib/verkko/scripts/hicverkko.py", line 8, in <module>
    import cluster
  File "~/miniconda3/lib/verkko/scripts/cluster.py", line 3, in <module>
    import networkx as nx
ModuleNotFoundError: No module named 'networkx'
skoren commented 1 year ago

Ah that is a missing dependency in the conda package, if you install networkx with conda it should get past this error. I'll add that to the next verkko build.

chunlinxiao commented 1 year ago

"conda install networkx" should install it, but after I did, for some reason, verkko still complained that networkx could not be found - kind weird !

Finally I used "pip install networkx --user" command to install - now it is running again ( in step of: output: 6-layoutContigs/unitig-popped.layout, 6-layoutContigs/unitig-popped.layout.scfmap, 6-layoutContigs/gaps.txt)

Hope this process will go all the way this time!

chunlinxiao commented 1 year ago

Again, verkko stopped with an error (below):

The log mentioned by message was actually NOT there (.snakemake/log/2023-07-10T163056.780339.snakemake.log).

[Wed Jul 12 11:57:00 2023]
Error in rule generateConsensus:
    jobid: 36
    input: 7-consensus/packages/part016.cnspack, 7-consensus/packages.tigName_to_ID.map, 7-consensus/packages.report
    output: 7-consensus/packages/part016.fasta
    log: 7-consensus/packages/part016.err (check log file(s) for error details)
    shell:

cd 7-consensus

mkdir -p packages

cat > ./packages/part016.sh <<EOF
#!/bin/sh
set -e

~/miniconda3/lib/verkko/bin/utgcns \\
    -V -V -V \\
    -threads 8 \\
    -import ../7-consensus/packages/part016.cnspack \\
    -A ../7-consensus/packages/part016.fasta.WORKING \\
    -C 2 -norealign \\
    -maxcoverage 50 \\
    -e  0.05 \\
    -em 0.20 \\
    -EM 0 \\
    -l 3000 \\
    -edlib \\
&& \\
mv ../7-consensus/packages/part016.fasta.WORKING ../7-consensus/packages/part016.fasta \\
&& \\
exit 0

echo ""
echo "Consensus did not finish successfully, exit code \$?."

echo ""
echo "Files in current directory:"
ls -ltr

echo ""
echo "Files in packages/:"
ls -ltr packages

exit 1
EOF

chmod +x ./packages/part016.sh

./packages/part016.sh > ../7-consensus/packages/part016.err 2>&1

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

[Wed Jul 12 13:31:08 2023]
Finished job 31.
29 of 34 steps (85%) done

[Wed Jul 12 18:23:17 2023]
Finished job 30.
30 of 34 steps (88%) done
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-07-10T163056.780339.snakemake.log
cp: cannot stat '*.fasta': No such file or directory
cp: cannot stat '*.layout': No such file or directory
skoren commented 1 year ago

Are you running this on a cluster? I'd guess this job ran out of time on the cluster. Can you post the end of part016.err?

chunlinxiao commented 1 year ago

just on local (not on a cluster).

>tail 7-consensus/packages/part016.err

generatePBDAG()--    read alignment: 0 failed, 97 passed.
Constructing graph
Merging graph
Calling consensus

Bye.
18996      46        45    4.59x        0    0.00x         1    1.00x
  83090    124298      77        15    1.01x        0    0.00x        62    4.63x
  83120    167647      61        58    4.88x        0    0.00x         3    1.55x
  83140     97573      97        25    2.10x        0    0.00x        72    6.74x
skoren commented 1 year ago

That log file has no error so no idea why snakemake thinks it failed. It might have been an intermittent I/O issue on the system. What's in the 7-consensus/packages folder? You can try running verkko with the --snakeopts --dry-run to see where it will resume.

chunlinxiao commented 1 year ago

the following is from dry run ("--snakeopts --dry-run ") : does this mean both part012 and part016 failed?

Launching bioconda verkko bioconda 1.4
Using snakemake 7.30.1.
Building DAG of jobs...
Nothing to be done (all requested files are present and up to date).
Launching bioconda verkko bioconda 1.4
Using snakemake 7.30.1.
Building DAG of jobs...
Job stats:
job                  count    min threads    max threads
-----------------  -------  -------------  -------------
cnspath                  1              1              1
combineConsensus         1              1              1
generateConsensus        2              8              8
total                    4              1              8

[Thu Jul 13 11:41:20 2023]
rule generateConsensus:
    input: 7-consensus/packages/part016.cnspack, 7-consensus/packages.tigName_to_ID.map, 7-consensus/packages.report
    output: 7-consensus/packages/part016.fasta
    log: 7-consensus/packages/part016.err
    jobid: 29
    reason: Missing output files: 7-consensus/packages/part016.fasta
    wildcards: nnnn=016
    threads: 8
    resources: tmpdir=/tmp, job_id=16, n_cpus=8, mem_gb=6, time_h=24

[Thu Jul 13 11:41:20 2023]
rule generateConsensus:
    input: 7-consensus/packages/part012.cnspack, 7-consensus/packages.tigName_to_ID.map, 7-consensus/packages.report
    output: 7-consensus/packages/part012.fasta
    log: 7-consensus/packages/part012.err
    jobid: 25
    reason: Missing output files: 7-consensus/packages/part012.fasta
    wildcards: nnnn=012
    threads: 8
    resources: tmpdir=/tmp, job_id=12, n_cpus=8, mem_gb=7, time_h=24

[Thu Jul 13 11:41:20 2023]
rule combineConsensus:
    input: 7-consensus/packages/part001.fasta, 7-consensus/packages/part002.fasta, 7-consensus/packages/part003.fasta, 7-consensus/packages/part004.fasta, 7-consensus/packages/part005.fasta, 7-consensus/packages/part006.fasta, 7-consensus/packages/part007.fasta, 7-consensus/packages/part008.fasta, 7-consensus/packages/part009.fasta, 7-consensus/packages/part010.fasta, 7-consensus/packages/part011.fasta, 7-consensus/packages/part012.fasta, 7-consensus/packages/part013.fasta, 7-consensus/packages/part014.fasta, 7-consensus/packages/part015.fasta, 7-consensus/packages/part016.fasta, 7-consensus/packages/part017.fasta, 7-consensus/packages/part018.fasta, 7-consensus/packages/part019.fasta, 7-consensus/packages/part020.fasta, 7-consensus/packages/part021.fasta, 7-consensus/packages/part022.fasta, 7-consensus/packages/part023.fasta, 7-consensus/packages/part024.fasta, 7-consensus/packages/part025.fasta, 7-consensus/packages/part026.fasta, 7-consensus/packages/part027.fasta, 7-consensus/packages/part028.fasta, 7-consensus/packages/part029.fasta, 7-consensus/packages.tigName_to_ID.map, 6-layoutContigs/unitig-popped.layout.scfmap, 5-untip/unitig-unrolled-unitig-unrolled-popped-unitig-normal-connected-tip.hifi-coverage.csv, 7-consensus/packages.finished, emptyfile, 5-untip/unitig-unrolled-unitig-unrolled-popped-unitig-normal-connected-tip.gfa
    output: 7-consensus/unitig-popped.fasta, 7-consensus/unitig-popped.haplotype1.fasta, 7-consensus/unitig-popped.haplotype2.fasta, 7-consensus/unitig-popped.unassigned.fasta
    log: 7-consensus/combineConsensus.out, 7-consensus/combineConsensus.err
    jobid: 11
    reason: Missing output files: 7-consensus/unitig-popped.unassigned.fasta, 7-consensus/unitig-popped.haplotype1.fasta, 7-consensus/unitig-popped.haplotype2.fasta, 7-consensus/unitig-popped.fasta; Input files updated by another job: 7-consensus/packages/part016.fasta, 7-consensus/packages/part012.fasta
    resources: tmpdir=/tmp, job_id=1, n_cpus=1, mem_gb=7, time_h=4

[Thu Jul 13 11:41:20 2023]
localrule cnspath:
    input: 6-layoutContigs/unitig-popped.layout, 6-layoutContigs/unitig-popped.layout.scfmap, 7-consensus/unitig-popped.fasta, 7-consensus/unitig-popped.haplotype1.fasta, 7-consensus/unitig-popped.haplotype2.fasta, 7-consensus/unitig-popped.unassigned.fasta
    output: assembly.homopolymer-compressed.layout, assembly.fasta
    jobid: 0
    reason: Missing output files: assembly.homopolymer-compressed.layout, assembly.fasta; Input files updated by another job: 7-consensus/unitig-popped.fasta, 7-consensus/unitig-popped.unassigned.fasta, 7-consensus/unitig-popped.haplotype1.fasta, 7-consensus/unitig-popped.haplotype2.fasta
    resources: tmpdir=/tmp

Job stats:
job                  count    min threads    max threads
-----------------  -------  -------------  -------------
cnspath                  1              1              1
combineConsensus         1              1              1
generateConsensus        2              8              8
total                    4              1              8

Reasons:
    (check individual jobs above for details)
    input files updated by another job:
        cnspath, combineConsensus
    missing output files:
        cnspath, combineConsensus, generateConsensus

This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.
cp: cannot stat '*.fasta': No such file or directory
cp: cannot stat '*.layout': No such file or directory
skoren commented 1 year ago

Either part012 failed or didn't run. Does it have an error log in packages, if so post the end of that log as well and the contents of packages folder.

chunlinxiao commented 1 year ago

part012.err seems not very helpful.

>tail 7-consensus/packages/part012.err
  81398    251335      83        74    5.53x        0    0.00x         9    2.43x
  81454    129664     156        60    3.93x        0    0.00x        96    6.86x
  81517    276679      72        70    5.17x        0    0.00x         2    1.76x
  81682    290560      74        45    2.48x        0    0.00x        29    2.20x
  81929    663293      33        31    1.98x        0    0.00x         2    1.32x
  81942    168156     124        32    1.60x        0    0.00x        92    5.06x
  82534    182164     109        29    1.34x        0    0.00x        80    4.09x
  82535    182024     109        28    1.33x        0    0.00x        81    4.17x
  82600    290022      71        68    2.95x        0    0.00x         3    1.61x
  82789    591765      35        32    0.49x        0    0.00x         3    1.65x

Now I'm re-running the job again (without dry-run) to see if it can run through this time ....

skoren commented 1 year ago

Neither one of those reported an error. I would have suggested just renaming the WORKING.fasta files to .fasta yourself and resuming from there. I don't think anything failed during those jobs.

chunlinxiao commented 1 year ago

no luck to run to the end of the pipeline with NO obvious error messages in the following 3 error files (part012.err, part016.err , part017.err ):

./packages/part017.sh > ../7-consensus/packages/part017.err 2>&1

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

.....
./packages/part016.sh > ../7-consensus/packages/part016.err 2>&1

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

....
./packages/part012.sh > ../7-consensus/packages/part012.err 2>&1

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-07-22T122822.430471.snakemake.log

The 3 fasta files were generated apparently, but the pipeline still complained and terminated.

>ls -l 7-consensus/packages/part012.fasta
-rw-rw-r-- 1 xiao2 varpipe 361420690 Jul 18 18:30 7-consensus/packages/part012.fasta
>ls -l 7-consensus/packages/part016.fasta
-rw-rw-r-- 1 xiao2 varpipe 378075971 Jul 18 17:06 7-consensus/packages/part016.fasta
>ls -l 7-consensus/packages/part017.fasta
-rw-rw-r-- 1 xiao2 varpipe 384098989 Jul 18 16:48 7-consensus/packages/part017.fasta
skoren commented 1 year ago

It's strange that no error is being reported and the job is being marked as failed by snakemake when the fasta file is generated. The fasta is only renamed from WORKING to the final name if the consensus command completed with no error and the next thing right after that is exit 0:

&& mv ../{output.consensus}.WORKING ../{output.consensus} && exit 0

Also, I'm not clear why this is running three consensus jobs, your previous failure and dry-run reported that only two jobs had failed. This seems like some kind of snakemake weirdness/bug (wouldn't be the first time). You should be able to run verkko with --snakeopts --touch and then --snakeopts --dry-run which should force it to use the outputs it generated and then resume the run.

skoren commented 1 year ago

I would think it's related to #166 but it's strange there is no final error or failure reported. In addition to trying the above, can you also share one of the package files (e.g. packages/part012.*).

chunlinxiao commented 1 year ago

the latest run with your suggestion failed again (with part12/16/17) - verkko hic-integration seems having some unknowns that caused this run terminated prematurely.

Just send package-part12.tar.gz to you as before.

Thanks for looking into this.

skoren commented 1 year ago

I was able to run the partition without error and it returned 0. The number of sequences output seems to match what you got so I think your file is complete as well. I have no idea why snakemake is detecting these jobs as failed but would guess this is a snakemake bug. Have you tried the suggestion to run touch and dry-run to see if it will continue on and use your existing outputs?

chunlinxiao commented 1 year ago

Thanks @skoren.

I actually tried before, but not for the latest run - it seemed that verkko could not get out of those jobs as it considered them as "failed" - basically it just re-run and then fail again ! I'm testing touch and dry-run now and seeing it is still generating "part012/16/17.fasta" (even they were actually finished).

Also I just noticed that 3 core dumps sit there under the folder of "7-consensus" (below) - not sure if this is of any help?

7-consensus>ls
assembly.disconnected.fasta  combined.fasta.lengths  packages.finished
assembly.disconnected.ids    core.142176             packages.readName_to_ID.map
assembly.fasta               core.183134             packages.report
assembly.ids                 core.47416              packages.tigName_to_ID.map
buildPackages.err            extractONT.err          screen-assembly.err
buildPackages.sh             extractONT.sh           screen-assembly.out
combineConsensus.err         ont_subset.extract      unitig-popped.fasta
combineConsensus.out         ont_subset.fasta.gz     unitig-popped.haplotype1.fasta
combineConsensus.sh          ont_subset.id           unitig-popped.haplotype2.fasta
combined.fasta               packages                unitig-popped.unassigned.fasta

>ls -l 7-consensus/core*

Jul 17 11:41 7-consensus/core.142176
Jul 17 03:22 7-consensus/core.183134
 Jul 17 03:13 7-consensus/core.47416
>gdb --core=7-consensus/core.142176 |more
...
Core was generated by `~/miniconda3/lib/verkko/bin/utgcns -V -V -V -threads 8 -import ../7-c'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000000000000000 in ?? ()
[Current thread is 1 (LWP 142182)]
>gdb --core=7-consensus/core.47416 |more

...
Core was generated by `~/miniconda3/lib/verkko/bin/utgcns -V -V -V -threads 8 -import ../7-c'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000000000000000 in ?? ()
[Current thread is 1 (LWP 50067)]
>gdb --core=7-consensus/core.183134 |more

....
Core was generated by `~/miniconda3/lib/verkko/bin/utgcns -V -V -V -threads 8 -import ../7-c'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000000000000000 in ?? ()
[Current thread is 1 (LWP 183499)]
skoren commented 1 year ago

Core would imply the jobs failed, at least at some point, but I would expect that to show up in the logs. In your original consensus error the -EM parameter was set to 0 but the ones you shared were correct so that could have been the source of the original core dump. Your run looks very strange as the folder also has the files combined.fasta, assembly.fasta, and even haplotype-split results which shouldn't be created unless the assembly finished consensus and moved onto the next step.

Is this a run that finished that you re-started with changed parameters? Is this the 7-consensus folder under the top-level folder or under 8-hicPipeline/final_contigs/7-consensus (assuming you're running w/hic)? Something is very off with this folder if that is the case because, like I said, many of these files shouldn't exist if the consensus partitions didn't finish. I think the multiple restarts have caused some kind of snakemake tracking issue and I would suggest making a new folder and copying over all the [1-8]- folders to it but not the 8-hicPipeline/final_contigs/7-consensus folder or any files named `assembly.` and seeing if that will run properly.

chunlinxiao commented 1 year ago

some update:

  1. I did not make any changes to the parameters myself (except those touch/dry-run).

  2. Previous sharing was from the top level (verkko_out/7-consensus/, not verkko_out/8-hicPipeline/final_contigs/7-consensus/ that I just noticed from you).

  3. Now I do see some differences in scripts of those failed 3 jobs between top level (verkko_out/7-consensus/) and verkko_out/8-hicPipeline/final_contigs/7-consensus:

eg.

In verkko_out/7-consensus/packages/part012.sh (also in part016.sh, part017.sh)

-EM 241302 \

but in verkko_out/8-hicPipeline/final_contigs/7-consensus/packages/part012.sh (in fact, all 29 jobs same here)

-EM 0 \

  1. The 3 jobs from latest test with your most-recent suggestion (above) were still failing with core dump (with "-EM 0" under verkko_out/8-hicPipeline/final_contigs/7-consensus).

But after I mv .fasta.WORKING to .fasta for those 3 jobs (under verkko_out/8-hicPipeline/final_contigs/7-consensus), I re-run the pipeline and it did get into the end finally.

skoren commented 1 year ago

Ah OK, that explains why the output showed no error. The HiC folder layout is a bit confusing and we plan to standardize it to match the trio/other runs in the near future. I am pretty sure this is the same as #166 since the -EM was not 0 in the initial consensus and became 0 later. I would suggest updating the -EM to match what it was in the top-level 7-consensus folder and re-running the partitions in 8-hicPipeline/final_contigs/7-consensus/packages. After that, snakemake should redo the last steps, you may have to remove the assembly.* files. This change will improve the consensus quality of your final assembly and should avoid the segfault.

skoren commented 1 year ago

These issues should all be addressed by the v1.4.1 release. I've also made a pull request to update conda: https://github.com/bioconda/bioconda-recipes/pull/42411