maickrau / ribotin

MIT License
30 stars 2 forks source link

Assertion `coreNodes.size() >= 1' failed. #13

Open diego-rt opened 2 weeks ago

diego-rt commented 2 weeks ago

Hello,

I'm trying out Ribotin to assemble some tangles. They are probably not rRNA but I assume that it doesn't really matter. The morph size appears to be around 3 kbp.

ribotin-ref --approx-morphsize 3000 -r morphs.fa -i ../reads/hifi_reads.fastq.gz --nano ../reads/ont_reads.fastq.gz -o output_folder5 -t 16

However, I am getting the following error after clustering:

ribotin-ref version bioconda 1.3
checking for MBG
/users/diego.terrones/miniforge3/envs/ribotin/bin/MBG
checking for GraphAligner
/users/diego.terrones/miniforge3/envs/ribotin/bin/GraphAligner
using reference from morphs.fa
output folder: output_folder5
extracting HiFi/duplex reads
running
running MBG
MBG command:
MBG -o output_folder5/graph.gfa -i output_folder5/hifi_reads.fa -k 101 -w 71 -a 2 -u 3 -r 600 -R 4000 --error-masking=msat --output-sequence-paths output_folder5/paths.gaf --only-local-resolve 1> output_folder5/mbg_stdout.txt 2> output_folder5/mbg_stderr.txt
reading graph
getting consensus
consensus length 3212bp
writing consensus
reading read paths
getting variants
147 variants
writing variants
writing variant graph
writing allele graph
writing variant vcf
extracting ultralong ONT reads
consensus length 3212, using 1606 as minimum ONT match length
start ultralong ONT analysis
aligning ultralong ONT reads to allele graph
GraphAligner command:
GraphAligner -g output_folder5/allele-graph.gfa -f output_folder5/ont_reads.fa -a output_folder5/ont-alns.gaf -t 16 --seeds-mxm-length 30 --seeds-mem-count 10000 --bandwidth 15 --multimap-score-fraction 0.99 --precise-clipping 0.85 --min-alignment-score 5000 --discard-cigar --clip-ambiguous-ends 100 --overlap-incompatible-cutoff 0.15 --mem-index-no-wavelet-tree --max-trace-count 5 1> output_folder5/graphaligner_stdout.txt 2> output_folder5/graphaligner_stderr.txt
reading allele graph
reading consensus
extract corrected ultralong paths
consensus path length 3212, using 1606 as minimum morph length
961 corrected paths
extract loops from ONTs
15114 loops in ONTs
cluster loops roughly
max clustering edit distance 200
aligning morph path pair 0 / 114208941
aligning morph path pair 1000000 / 114208941
[...]
aligning morph path pair 114000000 / 114208941
2 rough clusters
cluster loops by density
edit distance peak at 0
recluster with max edit distance 5, min points 5
cluster 0 with 8 reads reclustered to 1 clusters, sizes: 7
cluster 1 with 15106 reads reclustered to 14 clusters, sizes: 47 54 209 41 22 1230 536 78 18 11396 205 57 77 1130
15 density clusters
phase clusters
15 phased clusters
getting morph consensuses
ribotin-ref: src/ClusterHandler.cpp:3137: std::vector<Node> getConsensusPath(const std::vector<OntLoop>&, const GfaGraph&): Assertion `coreNodes.size() >= 1' failed.
Aborted

Many thanks in advance!

maickrau commented 2 weeks ago

Hi, could you share the output data files output_folder5/hifi_reads.fa and output_folder5/ont_reads.fa

diego-rt commented 2 weeks ago

Hey there,

Thanks a lot for the quick reply! Here go the input files and the morph file as well. It is a small targeted assembly job.

If you have any opinion on whether this tangle is solvable I would love to hear it.

Thank you!

maickrau commented 1 week ago

This seems to have been caused by the repeat unit being so short that the hifi reads already resolved a few morphs in the initial MBG graph and the graph does not have the topology that ribotin expects. The consensus sequence output_folder5/consensus.fa should match the most abundant morph which seems to be a bit over 2/3 of the sequence.