maickrau / GraphAligner

MIT License
256 stars 30 forks source link

Aborted (core dumped) #18

Closed HaploKit closed 2 years ago

HaploKit commented 4 years ago

Hi, I used GraphAligner to align long reads to the POA graph which was generated by spoa https://github.com/rvaser/spoa, but got an error:

GraphAligner -g xx.gfa -f reads.fa -a reads.gaf -x vg GraphAligner bioconda 1.0.11- GraphAligner bioconda 1.0.11- Load graph from xx.gfa Build alignment graph 69304 original nodes 69304 split nodes 0 ambiguous split nodes 233414 edges 56013 nodes with in-degree >= 2 Build minimizer seeder from the graph Signal 11. Read: xx.gfaessing. Seed: 0+,0,0,0 Aborted (core dumped)

The input gfa file looks like this: (each node is a single base) H VN:Z:1.0 S 0 G S 1 G S 2 T S 3 C S 4 T S 5 C S 6 T ... S 34650 C S 34651 A S 34652 C L 0 + 1 + 0M L 1 + 2 + 0M L 2 + 3 + 0M L 3 + 4 + 0M ... L 34650 + 5823 + 0M L 34651 + 2052 + 0M L 34652 + 8258 + 0M

Any help would be appreciated.

maickrau commented 4 years ago

Hi,

This no longer crashes in 7ffb2a5 but it still won't work on the graph as is. The reason is that the seeding module cannot handle nodes which are split into individual base pairs. You can merge nodes with vg (https://github.com/vgteam/vg) using the command vg view -Fv graph.gfa | vg mod -u - | vg view - > graph_unchopped.gfa, and GraphAligner should work after that.

HaploKit commented 4 years ago

Many thanks. It does work now. I got another problem now. Because the POA graph is constructed from noisy long reads, so there would be many 'incorrect nodes' (probably due to sequencing errors) in the graph. When using GraphAligner to align noisy long reads to the unchopped graph, only a small fraction of reads(say 2067/14546) can be aligned. Is GraphAlinger suitable for handing the alignment in this case? Thanks in advance.

ekg commented 4 years ago

Maybe the odgi prune step could be used to remove the nodes with low coverage. I've been experimenting with this.

How are you constructing the graph from spoa? I'm going to write a GFA export function for it unless you have something ready.

On Fri, May 29, 2020, 11:56 Vincent notifications@github.com wrote:

Many thanks. It does work now. I got another problem now. Because the POA graph is constructed from noisy long reads, so there would be many 'incorrect nodes' (probably due to sequencing errors) in the graph. When using GraphAligner to align noisy long reads to the unchopped graph, only a small fraction of reads(say 2067/14546) can be aligned. Is GraphAlinger suitable for handing the alignment in this case? Thanks in advance.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/maickrau/GraphAligner/issues/18#issuecomment-635885630, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEPOFVKXU4AYFGU4L6TRT6BDZANCNFSM4NIQPGPQ .

HaploKit commented 4 years ago

Hi Erik, Many thanks for your suggestions, I will try.

The graph basically looks good at least for my current test data. It would be great if you can write a GFA export function. Now I just wrote a simple python script to convert the dot file generated by spoa to GFA.

HaploKit commented 4 years ago

Maybe the odgi prune step could be used to remove the nodes with low coverage. I've been experimenting with this. How are you constructing the graph from spoa? I'm going to write a GFA export function for it unless you have something ready. On Fri, May 29, 2020, 11:56 Vincent @.***> wrote: Many thanks. It does work now. I got another problem now. Because the POA graph is constructed from noisy long reads, so there would be many 'incorrect nodes' (probably due to sequencing errors) in the graph. When using GraphAligner to align noisy long reads to the unchopped graph, only a small fraction of reads(say 2067/14546) can be aligned. Is GraphAlinger suitable for handing the alignment in this case? Thanks in advance. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#18 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEPOFVKXU4AYFGU4L6TRT6BDZANCNFSM4NIQPGPQ .

Hi Erik, I tried "odgi prune" to remove tips in the POA graph (nodes in a unbranching path are compressed), however, it did not work. Could you please have a look what is going on here? Thanks. Here is the command and error message: odgi prune -i reads.sorted.poa.compressed.gfa -T 2 -k 9

terminate called after throwing an instance of 'std::runtime_error' what(): error: Serialized handle graph does not match deserialzation type. Aborted (core dumped)

By the way, what does '-T' mean? Does it mean the max number of nodes which will be treated as tips?

ekg commented 4 years ago

I'll have a PR merged in today that improves the tip pruning in odgi.

It removes the tips optionally based on path coverage and trims paths rather than removing all paths from the graph.

You would use it like this:

odgi build -g x.gfa -o - \
    | odgi prune -i x.odgi -o - -T -m 2 \
    | odgi view -i - g >x.prune.gfa

That will trim tips covered by less than 2 paths.

On Tue, Jun 2, 2020, 21:19 Vincent notifications@github.com wrote:

Maybe the odgi prune step could be used to remove the nodes with low coverage. I've been experimenting with this. How are you constructing the graph from spoa? I'm going to write a GFA export function for it unless you have something ready. … <#m4078008593270686243> On Fri, May 29, 2020, 11:56 Vincent @.***> wrote: Many thanks. It does work now. I got another problem now. Because the POA graph is constructed from noisy long reads, so there would be many 'incorrect nodes' (probably due to sequencing errors) in the graph. When using GraphAligner to align noisy long reads to the unchopped graph, only a small fraction of reads(say 2067/14546) can be aligned. Is GraphAlinger suitable for handing the alignment in this case? Thanks in advance. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#18 (comment) https://github.com/maickrau/GraphAligner/issues/18#issuecomment-635885630>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEPOFVKXU4AYFGU4L6TRT6BDZANCNFSM4NIQPGPQ .

Hi Erik, I tried "odgi prune" to remove tips in the POA graph (nodes in a unbranching path are compressed), however, it did not work. Could you please have a look what is going on here? Thanks. Here is the command and error message: odgi prune -i reads.sorted.poa.compressed. gfa -T 2 -k 9

terminate called after throwing an instance of 'std::runtime_error' what(): error: Serialized handle graph does not match deserialzation type. Aborted (core dumped)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/maickrau/GraphAligner/issues/18#issuecomment-637754849, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEL6OLU272TED2663JDRUVGFZANCNFSM4NIQPGPQ .

ekg commented 4 years ago

Also there is a PR open on spoa which includes a GFA export function. It should be merged in pending another PR.

On Wed, Jun 3, 2020, 09:02 Erik Garrison erik.garrison@gmail.com wrote:

I'll have a PR merged in today that improves the tip pruning in odgi.

It removes the tips optionally based on path coverage and trims paths rather than removing all paths from the graph.

You would use it like this:

odgi build -g x.gfa -o - \
    | odgi prune -i x.odgi -o - -T -m 2 \
    | odgi view -i - g >x.prune.gfa

That will trim tips covered by less than 2 paths.

On Tue, Jun 2, 2020, 21:19 Vincent notifications@github.com wrote:

Maybe the odgi prune step could be used to remove the nodes with low coverage. I've been experimenting with this. How are you constructing the graph from spoa? I'm going to write a GFA export function for it unless you have something ready. … <#m_2923085352791260002_m4078008593270686243> On Fri, May 29, 2020, 11:56 Vincent @.***> wrote: Many thanks. It does work now. I got another problem now. Because the POA graph is constructed from noisy long reads, so there would be many 'incorrect nodes' (probably due to sequencing errors) in the graph. When using GraphAligner to align noisy long reads to the unchopped graph, only a small fraction of reads(say 2067/14546) can be aligned. Is GraphAlinger suitable for handing the alignment in this case? Thanks in advance. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#18 (comment) https://github.com/maickrau/GraphAligner/issues/18#issuecomment-635885630>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEPOFVKXU4AYFGU4L6TRT6BDZANCNFSM4NIQPGPQ .

Hi Erik, I tried "odgi prune" to remove tips in the POA graph (nodes in a unbranching path are compressed), however, it did not work. Could you please have a look what is going on here? Thanks. Here is the command and error message: odgi prune -i reads.sorted.poa.compressed. gfa -T 2 -k 9

terminate called after throwing an instance of 'std::runtime_error' what(): error: Serialized handle graph does not match deserialzation type. Aborted (core dumped)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/maickrau/GraphAligner/issues/18#issuecomment-637754849, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEL6OLU272TED2663JDRUVGFZANCNFSM4NIQPGPQ .

HaploKit commented 4 years ago

These help a lot, thanks.

maickrau commented 2 years ago

This seems to be fixed now. If there's more errors please open a new issue.