pangenome / odgi

Optimized Dynamic Genome/Graph Implementation: understanding pangenome graphs
https://doi.org/10.1093/bioinformatics/btac308
MIT License
196 stars 40 forks source link

Odgi unchop performance for large graphs #584

Open sivico26 opened 4 months ago

sivico26 commented 4 months ago

Hello again,

I realized that even though my issue affects smoothxg, it is more concerned with an odgi algorithm, so I am putting it also here for future reference.

AndreaGuarracino commented 2 months ago

Hi @sivico26 , I can't make promises, but if you can share one graph, I could look at where the bottleneck might be!

sivico26 commented 2 months ago

Hi @AndreaGuarracino.

I can do that. Where should I send it? to your mail at uthsc?

By the way, the Job just eclipsed 2200h. At this scale, odgi unchop is definitively the bottleneck of smoothxg, taking more than 80% of the time (and counting). If we find a way to address this, people working on crops who want to include wild ancestors in their pangenomes will surely appreciate it.

My cluster's admins speculated that odgi unchop is $O(n^2)$ (on the number of nodes I imagine). Can you confirm if this is the case? do you know? Knowing this would help us to determine if we could wait for the job or if we should rather proceed with the input graph for our work.

AndreaGuarracino commented 2 months ago

To my UTHSC is fine.

As short answer, I would skip the unchopping in smoothxg in order to work with a smoothed graph. I've never made a fornal complexity analysis of the unchop algorithm, but the issue has a quadratic smell!

Sent from Outlook for Androidhttps://aka.ms/AAb9ysg


From: Simón Villanueva Corrales @.> Sent: Wednesday, September 4, 2024 4:21:23 PM To: pangenome/odgi @.> Cc: Andrea Guarracino @.>; Mention @.> Subject: Re: [pangenome/odgi] Odgi unchop performance for large graphs (Issue #584)

Hi @AndreaGuarracinohttps://github.com/AndreaGuarracino.

I can do that. Where should I send it? to your mail at uthsc?

By the way, the Job just eclipsed 2200h. At this scale, odgi unchop is definitively the bottleneck of smoothxg, taking more than 80% of the time (and counting). If we find a way to address this, people working on crops who want to include wild ancestors in their pangenomes will surely appreciate it.

My cluster's admins speculated that odgi unchop is $O(n^2)$ (on the number of nodes I imagine). Can you confirm if this is the case? do you know? Knowing this would help us to determine if we could wait for the job or if we should rather proceed with the input graph for our work.

— Reply to this email directly, view it on GitHubhttps://github.com/pangenome/odgi/issues/584#issuecomment-2330311512, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AO26XHRU67FGKGHHYIHZZFTZU6IXHAVCNFSM6AAAAABK6BMSQGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZQGMYTCNJRGI. You are receiving this because you were mentioned.Message ID: @.***>

sivico26 commented 2 months ago

Hi @AndreaGuarracino,

My cluster's admins recently informed me that they need to shut down the server where my job is running. Thus, my job will be killed after running ~2600 hours.

Is there a way to resume smoothxg processing somewhere by copying the current temporary files? These are the files currently in the folder:

[sivico26@urga1 ~]$ ls /scratch/sivico26/job_6857522.cerit-pbs.cerit-sc.cz/results_uv/tmp/temp-27WbAw/ -lh 
total 108G
-rw-------. 1 sivico26 meta  98G jun 15 14:41 0LXmlE
-rw-------. 1 sivico26 meta 4,9G jun 10 14:38 E9L5T3
-rw-------. 1 sivico26 meta 4,9G jun 10 14:36 Gsu5sk
-rw-------. 1 sivico26 meta  12K jun 15 14:42 hAFKNe

What do you think, would they be of any use? Thank you in advance