lh3 / minigraph

Sequence-to-graph mapper and graph generator
https://lh3.github.io/minigraph
MIT License
419 stars 38 forks source link

Assertion failure when mapping barley genome #76

Closed glennhickey closed 2 years ago

glennhickey commented 2 years ago

This is courtesy of @LindaMilne via https://github.com/ComparativeGenomicsToolkit/cactus/issues/800 and reproduces with the minigraph-0.19_x64-linux release binary:

wget https://ics.hutton.ac.uk/barleyrtd/data/barley_two_old_genomes.gfa.gz
wget https://ics.hutton.ac.uk/barleyrtd/data/morex_1H_old.fasta.gz
minigraph barley_two_old_genomes.gfa.gz morex_1H_old.fasta.gz -xasm -c -t 32 -o morex.gaf
minigraph [M::main::7.941*1.00] loaded the graph from "barley_two_old_genomes.gfa.gz"                                                                                                                                                                                                                                                       [M::mg_index::59.948*1.27] indexed the graph
[M::mg_opt_update::62.273*1.26] occ_max1=100; lc_max_occ=2
minigraph: galign.c:133: mg_gchain_cigar: Assertion `l == gc->qe - gc->qs && gc->p->aplen == gc->pe - gc->ps' failed.
Aborted
glennhickey commented 2 years ago

@lh3 this issue seems to have arisen again in cow genomes: https://github.com/ComparativeGenomicsToolkit/cactus/issues/832#issuecomment-1308851997

RuntimeError: Command /usr/bin/time -v bash -c 'set -eo pipefail && minigraph /tmp/664b8033018b57ed8eeaf89049092aea/35c4/4b7d/tmpdsu3dzqo/mg.gfa /tmp/664b8033018b57ed8eeaf89049092aea/35c4/4b7d/tmpdsu3dzqo/Duroc.fa -o /tmp/664b8033018b57ed8eeaf89049092aea/35c4/4b7d/tmpdsu3dzqo/Duroc.gaf -c -xasm -t 30' exited 134: stdout=None, stderr=[M::main::0.4641.00] loaded the graph from "/tmp/664b8033018b57ed8eeaf89049092aea/35c4/4b7d/tmpdsu3dzqo/mg.gfa"
[M::mg_index::7.9171.58] indexed the graph
[M::mg_opt_update::8.4481.55] occ_max1=100; lc_max_occ=2
minigraph: galign.c:133: mg_gchain_cigar: Assertion `l == gc->qe - gc->qs && gc->p->aplen == gc->pe - gc->ps' failed.
Command terminated by signal 6
lh3 commented 2 years ago

I can see a download link in https://github.com/ComparativeGenomicsToolkit/cactus/issues/800. What's the command line in use?

glennhickey commented 2 years ago

For the barley genome, the command line is above:

wget https://ics.hutton.ac.uk/barleyrtd/data/barley_two_old_genomes.gfa.gz
wget https://ics.hutton.ac.uk/barleyrtd/data/morex_1H_old.fasta.gz
minigraph barley_two_old_genomes.gfa.gz morex_1H_old.fasta.gz -xasm -c -t 32 -o morex.gaf

If I remember, it takes about an hour and (maybe around 40Gb RAM) to crash. The gfa would have been constructed with minigraph -cxggs for those genomes.

Thanks!

lh3 commented 2 years ago

PS: just cut release v0.20 for this and a few other bug fixes. Results should remain the same.

glennhickey commented 2 years ago

Awesome, thanks!!