graspologic-org / graspologic

Python package for graph statistics
https://graspologic-org.github.io/graspologic/
MIT License
772 stars 143 forks source link

Refactor embedding CLT estimation into user-facing code #378

Open bdpedigo opened 4 years ago

bdpedigo commented 4 years ago

@alyakin314 is adding estimation of covariance for ASE central limit theorem as part of #366.

It would be nice to have this code as a function in GraSPy for plotting and doing theoretical investigations/simulations. And even as just something to use in the tutorials for explanation.

alyakin314 commented 4 years ago

i completely agree, and this was one of the reasons why i made this into a standalone function rather than a method already. not a proper user-facing interface, but should be easier to use than if it belonged to an object. we can discuss into where that should go at some point.

bdpedigo commented 4 years ago

some of this may be addressed in #419. I doubt tutorial notebook will be, though, so I'm guessing there will be at least some work left to do.

alyakin314 commented 4 years ago

@bdpedigo do you have any thoughts on what it should be? i actually kind of like .fit .predict here, where .fit does what's in the current outside function and .predict does the inside one?

bdpedigo commented 4 years ago

I kinda agree w fit predict here but we could talk more.

Are you doing this one or want me to find someone that could be interested?

alyakin314 commented 4 years ago

i'll claim this one for now (and that's my only task left other than what's in #419 i believe?).

where do we want this to be? utils, but separate file?

bdpedigo commented 3 years ago

@alyakin314 utils but file is fine. do you have tests in mind for this function/class?

alyakin314 commented 3 years ago

@alyakin314 utils but file is fine.

okay. i won't be doing that as a part of #419 though, if you don't mind. that also was just a suggestion. depending on what the philosophy behind new models+simulations combined module, i can see that being a better place for it, or not.

do you have tests in mind for this function/class?

nope. but of the top of my head: we know simple closed forms for er and 1-d sbm, so presumably those estimated using the rdpg formula should be within some tolerance of it?

bdpedigo commented 3 years ago

okay. i won't be doing that as a part of #419 though

that's fine, can you just let me know when you plan on doing it, since #442 depends on it and i just wanna know for NDD purposes

that also was just a suggestion.

I don't see a better place for it for now. maybe models someday but I dont think it belongs there ATM.

alyakin314 commented 3 years ago

okay. i won't be doing that as a part of #419 though

that's fine, can you just let me know when you plan on doing it, since #442 depends on it and i just wanna know for NDD purposes

happy to do right after align saga is over, so i guess this week? doesn't seem like something that should take that much time.

that also was just a suggestion.

I don't see a better place for it for now. maybe models someday but I dont think it belongs there ATM.

cool.

alyakin314 commented 3 years ago

this is now in progress. ETA by end of the month. (for #442 purposes).

bdpedigo commented 3 years ago

Are you doing #442 ? just curious

Also didn't you already pull the CLT stuff out in last PR?

alyakin314 commented 3 years ago

not planning on doing #442 atm. i thought that's ndd issue, but was not sure if claimed.

i did move it out of LDT to utils, but it still has a function that returns a function api, which is cute but not really user-friendly. i thought we want to have it .fit_predict?

bdpedigo commented 3 years ago

was not claimed but someone may be interested in the future.

you may be right, not sure.

alyakin314 commented 3 years ago

Let's discuss this though, because is important:

my plan is: put dis into fit, store whatever variables necessary (not user facing, obviously): https://github.com/microsoft/graspologic/blob/bbbc68a24ca9e2097575e7f92f809916ea3eeb46/graspologic/utils/utils.py#L667-L672 put dis into predict: https://github.com/microsoft/graspologic/blob/bbbc68a24ca9e2097575e7f92f809916ea3eeb46/graspologic/utils/utils.py#L690-L701

boom.

alyakin314 commented 3 years ago

in which case:

fit takes the embeddings to compute the moments over (generally this would be an embedding of the whole graph).

predict takes the points to evaluate the variance at. this can be all points in the same graph, a list of them, only one, or even some random points that you for some reason want to know variances at, if there existed a vertex with a latent position there.

bdpedigo commented 3 years ago

at a glance, seems like the right approach to me!

bdpedigo commented 3 years ago

Tasks: