graspologic-org / graspologic

Python package for graph statistics
https://graspologic-org.github.io/graspologic/
MIT License
774 stars 143 forks source link

Improve scaling on `simulations.sbm` #713

Open loftusa opened 3 years ago

loftusa commented 3 years ago

It's currently taking me ~20 seconds to make an SBM with 1500 nodes with

simulations.sbm[500, 500, 500], B, return_labels=True

where B is a block probability matrix.
This scales poorly, especially for experiments in which we need to generate lots of large SBMs.
Speeding up this simulation to scale better would be super nice. Looks like the issue is in the loop here.

I wonder if it's possible to vectorize to speed this loop up? or possibly do some calculations differently, or even (if other options don't work) parralelize?

another option is to use cython here, but not sure how worth it that is... sklearn uses the crap out of cython to speed up their stuff though, and I'm sure it'd be useful in other places besides just this function? thoughts?

EDIT: Now it's going way faster, so maybe the problem was just on my machine. Maybe close this issue?

bdpedigo commented 3 years ago

a few thoughts:

rajpratyush commented 3 years ago

@loftusa does this issue needs work?

loftusa commented 3 years ago

@rajpratyush Maybe - I think the issue was primarily just with my machine at the time, but I'm sure we could still find a way to speed simulation generation up. I'll leave it to @bdpedigo to decide if it's worth pursuing. #734 might be a fun project for you too, but again, not sure at the moment whether or not it's a priority -- @bdpedigo thoughts?

rajpratyush commented 3 years ago

@loftusa I was going through the file that you mentioned few days back and my first impression was it is evidently will be slow in execution because of too many for loops. It definitely needs to optimized.

bdpedigo commented 3 years ago

@rajpratyush Maybe - I think the issue was primarily just with my machine at the time, but I'm sure we could still find a way to speed simulation generation up. I'll leave it to @bdpedigo to decide if it's worth pursuing. #734 might be a fun project for you too, but again, not sure at the moment whether or not it's a priority -- @bdpedigo thoughts?

If someone can generate convincing evidence that another sampling method is faster and the results are still accurate, then I'm happy to have it, though I can't commit to reviewing anything right now.