marbl / verkko

Telomere-to-telomere assembly of accurate long reads (PacBio HiFi, Oxford Nanopore Duplex, HERRO corrected Oxford Nanopore Simplex) and Oxford Nanopore ultra-long reads.
294 stars 29 forks source link

MBG: Assertion `lessTrim > 0' failed #72

Closed oushujun closed 2 years ago

oushujun commented 2 years ago

Hello,

I successfully ran verkko under local mode on one Arabidopsis sample but failed on another sample. My versions:

$ verkko --version
bioconda verkko bioconda 1.0

$ MBG --version
MBG bioconda 1.0.10
Version: bioconda 1.0.10

$ GraphAligner --version
GraphAligner bioconda 1.0.16-
GraphAligner bioconda 1.0.16-
Version bioconda 1.0.16-

Error in 1-buildGraph/buildGraph.err:

>try resolve k=2349, replaced 28 nodes with 111 nodes, unitigified 171 nodes to 84 nodes
try resolve k=2350, replaced 32 nodes with 133 nodes, unitigified 198 nodes to 98 nodes
try resolve k=2351, replaced 32 nodes with 121 nodes, unitigified 183 nodes to 89 nodes
try resolve k=2352, replaced 29 nodes with 125 nodes, unitigified 192 nodes to 96 nodes
try resolve k=2353, replaced 27 nodes with 101 nodes, unitigified 148 nodes to 74 nodes
removed trim problem node 12607
MBG: src/UnitigResolver.cpp:2257: void maybeTrim(ResolvableUnitigGraph&, std::vector<PathGroup>&, size_t, std::pair<long unsigned int, bool>, size_t): Assertion `lessTrim > 0' failed.
./buildGraph.sh: line 39: 89915 Aborted                 (core dumped) /home/sou6/bin/miniconda3/envs/asm/bin/MBG $iopt -t 4 -k 1001 -r 15000 -R 4000 -w 100 --kmer-abundance 1 --unitig-abundance 2 --error-masking=collapse-msat --output-sequence-paths ../1-buildGraph/paths.gaf --out ../1-buildGraph/hifi-resolved.gfa

Thanks, Shujun

skoren commented 2 years ago

Try running the latest tip of MBG that you were using before, you can just edit the buildGraph.sh script to point to the other version of MBG and run it by hand. There are a few bugs fixed in 1.0.10 by the tip (but it has it's own bugs as you've discovered which is why it hasn't been tagged for a release yet).

If that doesn't work, can you post the hifi-corrected.fasta reads somewhere we can access them to debug?

oushujun commented 2 years ago

@skoren Which version of MBG should I try?

skoren commented 2 years ago

The one you were using in #67

oushujun commented 2 years ago

OK sure I will do that.

oushujun commented 2 years ago

@skoren I tried the bulidGraph.sh script with the previous MBG version and it works without errors.

MBG Branch master commit de2b7e859e9710259f199830ec6a18643d1435ae 2022-04-27 12:02:30 +0200
Parameters: k=1001,w=100,a=1,u=2,t=8,r=15000,R=4000,hpcvariantcov=0,errormasking=collapse-msat,endkmers=no,blunt=no,keepgaps=no,guesswork=no,cache=no
Collecting selected k-mers
Reading sequences from ../hifi-corrected.fasta
130537679 total selected k-mers in reads
6557554 distinct selected k-mers in reads
Unitigifying
Filtering by unitig coverage
102326 distinct selected k-mers in unitigs after filtering
Getting read paths
Reading sequences from ../hifi-corrected.fasta
Resolving unitigs
25093 unitigs before resolving
1008989 raw read paths
1012073 raw read paths
156419 read paths
removed 8827 tips
...
try resolve k=14978, replaced 1 nodes with 3 nodes, unitigified 4 nodes to 2 nodes
10208 unitigs after resolving
Building unitig sequences
Reading sequences from ../hifi-corrected.fasta
Writing graph to ../1-buildGraph/hifi-resolved.gfa
Writing paths to ../1-buildGraph/paths.gaf
selecting k-mers and building graph topology took 219730,860 s
unitigifying took 3,716 s
filtering unitigs took 0,985 s
getting read paths took 69559,859 s
resolving unitigs took 7917,595 s
building unitig sequences took 500,157 s
forcing edge consistency took 5,461 s
writing the graph and calculating stats took 7,819 s
writing sequence paths took 49,258 s
nodes: 10208
edges: 8313
assembly size 169526730 bp, N50 209093
approximate number of k-mers ~ 159308522
skoren commented 2 years ago

You can then resume the assembly with this MBG output. Try running with --snakeopts "dry-run" to make sure it will continue from the next step.

There have been fixes to the MBG repo so when you get a chance, can you also re-try the latest version on your failed dataset (from issue #67)?

oushujun commented 2 years ago

The latest version MBG (ad2934f) works but the graph is less contiguous comparing to the one generated by MBG bioconda 1.0.8

MBG bioconda 1.0.8 Parameters: k=1001,w=100,a=1,u=2,t=90,r=15000,errormasking=collapse-msat,endkmers=no,blunt=no ... nodes: 9353 edges: 5074 assembly size 120021075 bp, N50 1381966 approximate number of k-mers ~ 110658722

MBG Branch master commit de2b7e859e9710259f199830ec6a18643d1435ae 2022-04-27 12:02:30 +0200 Parameters: k=1001,w=100,a=1,u=2,t=8,r=15000,R=4000,hpcvariantcov=0,errormasking=collapse-msat,endkmers=no,blunt=no,keepgaps=no,guesswork=no,cache=no ... nodes: 10208 edges: 8313 assembly size 169526730 bp, N50 209093 approximate number of k-mers ~ 159308522

MBG Branch master commit ad2934f0dacaaf7ad21e3c0bd46da67f8a4e85ed 2022-05-10 17:13:58 +0200 Parameters: k=1001,w=100,a=1,u=2,t=10,r=15000,R=4000,hpcvariantcov=0,errormasking=collapse-msat,endkmers=no,blunt=no,keepgaps=no,guesswork=no,cache=no ... nodes: 10208 edges: 8313 assembly size 169526730 bp, N50 209093 approximate number of k-mers ~ 159308522

skoren commented 2 years ago

Version 1.0.8 is pretty old and I wouldn't be surprised that the graphs are different, it didn't support the -R option for example. I'd expect version 1.0.9 to be more similar, you could try that one.

oushujun commented 2 years ago

Based on the N50 the MBG step seems not working...

MBG bioconda 1.0.10 Parameters: k=1001,w=100,a=1,u=2,t=4,r=15000,R=4000,errormasking=collapse-msat,endkmers=no,blunt=no,keepgaps=no,cache=no ... nodes: 19748 edges: 16324 assembly size 231665828 bp, N50 18988 approximate number of k-mers ~ 211898080

skoren commented 2 years ago

You shouldn't be judging the assembly based on the MBG graph. The MBG graph is just a large k de Bruijn graph which is completely phased. You can't make very large phased nodes from HiFi only data. There's lots of resolution of this graph afterwards.

oushujun commented 2 years ago

I see. I will try to finish the remaining steps. Thanks for the advise

skoren commented 2 years ago

You should continue the pipeline with the MBG run you had previously, the one from tip not 1.0.10. I thought 1.0.10 was breaking on this sample which is why you had to switch? How did you run 1.0.10 now? Or is this a different sample?

oushujun commented 2 years ago

My apologies. The one on 1.0.10 is a different sample. I will continue the run finished by MBG ad2934f.

skoren commented 2 years ago

Ah OK, I prefer to keep different samples in different issues. Since the new MBG resolves your original crash, I'll close this one. Feel free to open a new issue if you have issues with this new assembly.

oushujun commented 2 years ago

An update here: Verkko bioconda 1.0 resolved one more chromosome of Col-0. Only chr2 broke into two big contigs.

image

image