Closed RenzoTale88 closed 2 years ago
That's a common warning, and it might be the best to just hide it from users.
The dynamic programming implementation we use can't align reads to arbitrarily large subgraphs. Even if it could, we would still abandon the attempt beyond some threshold, because it would require too much time for a single potential mapping of a single read.
Giraffe tries to use dynamic programming when a mapping looks promising enough but it can't extend any nearby seed to an alignment without gaps. If the relevant subgraph is too large and complex, Giraffe abandons the attempt. This happens more often in pair rescue, where the graph region is typically 500-1000 bp (but may contain tens of kilobases of sequence), than in the alignment phase, where the region is usually 200 bp or less.
If the underlying cause is an indel error in the read, other reads should align fine to that region. If the sequenced genome contains an indel in that region but the indel is not present in the graph, the issue could affect other reads containing the indel. It might be possible to avoid that by using graphs where complex regions are less collapsed and contain more duplicated sequence, but I don't think anyone has investigated that option.
@jltsiren thank you for the explanation! I'll close this thread now then.
Andrea
Hello, I've generated a graph genome of five small assemblies (~220-250Mb genome size) using the cactus pangenome workflow. The workflow completes successfully, and generated a gg, gbwt and xg genome graphs, that I then used to generate the minimizer and distance indexes.When I try to run giraffe on the graph I get the following warning:
My question is: does the warning means that some of the alignments to the 14Kb-long region will be lost? If so, how can I prevent it?
Thank you in advance, Andrea