pangenome / smoothxg

linearize and simplify variation graphs using blocked partial order alignment
Other
56 stars 6 forks source link

Performance when unchopping large graphs #209

Open sivico26 opened 2 months ago

sivico26 commented 2 months ago

Hello there,

I am currently running smoothxg in a gigantic graph, which I naturally expected to take several weeks to process. I have used smoothxg with large graphs before but this one is the biggest so far.

While monitoring my previous runs, I realized that a big chunk of the processing time is spent in "unchopping [the] smoothed graph". This step is, in practice, at least for the graphs I have worked with, single-threaded. This led me to believe there is room to improve its performance. However, after a glance at the code, I see that threads are indeed used (or at least specified). I know little to nothing about C++, so I might be deceived.

Anyway, I would like to ask about the unchopping algorithm, how is it using the threads, why it is mostly serial in practice (again, at least in my hands, with large graphs), and ask if there is room for improvement.

This comes in a bit from observing with frustration that the "unchopping" of the graphs is taking more time than the actual alignment of the POA blocks and the smoothing itself. To give you an idea, even though I don't have precise numbers, my current job has just surpassed 800 hours, out of which the last 400 have been spent unchopping the smoothed graph.

Thank you for your attention.