code for data manipulation/MaggotGraph class

bdpedigo commented 3 years ago

[ ] lcc: reduce to the largest connected component or union thereof
[x] bisect: split the graph by left left, left right, etc.
[ ] sort meta/adj by class, hemisphere, etc
[ ] preprocessing for embedding eg. pass_to_rank, augment_diagonal, etc
[ ] binarize graph
[ ] remove 'unqualified' such as low degree, partially differentiated

tliu68 commented 3 years ago

I know some of them are in your MetaGraph class. Are you planning to build MaggotGraph on it?

tliu68 commented 3 years ago

Can we group the functions by something like necessary vs. optional for graph analysis and/or by the intended analysis procedures?

bdpedigo commented 3 years ago

I know some of them are in your MetaGraph class. Are you planning to build MaggotGraph on it?

I kinda hate a lot of that code so not necessarily. I was playing around with a new implementation here: https://github.com/neurodata/maggot_connectome/blob/main/sandbox/test_graph.py

I am pretty torn on how to implement this stuff. I think any option will require a node metadata dataframe being part of it, because i find those manipulations essential to most of what i do. question is how to store the graph itself:

could be an adjacency or set of adjacencies for different edge types that we manipulate concurrently with the dataframe
- when we sort the node metadata, the adjacency(s) gets manipulated too
- does not keep track of edge attributes really, though we haven't done much with that info thus far besides (edge type). someday we may want to tho.
- easy to manipulate in terms of edge weights (like binarize, remove loops, etc.)
could just store a networkx graph, and then whenever we call for the adjacency attribute, just get it out of the networkx object according to the indexing of the node metadata
- has the advantage that all node/edge metadata are still stored in the networkx object
- dont have to write much code to sort or otherwise wrangle the adjacency a bunch of times, just have to spit it out
something more complicated like storing a node table and an edge table and similarly spitting out an adjacency or networkx when requested.
- basically means making our own graph object which is probably dumb
- advantage that we can filter edges and nodes explicitly using familiar pandas operations

The code above is basically (1) but I'm not sure what the smartest thing to continue with is. part of me wants to use as much code as possible that is already writtten (netwokx) and the other part of me just wants to write exactly the class we want for this project based on my experience thus far.

bdpedigo commented 3 years ago

Can we group the functions by something like necessary vs. optional for graph analysis and/or by the intended analysis procedures?

im not sure what you mean here. we can always have functions that call other functions such as .standard_preprocessing() that could do all of the stuff we normally think makes sense for certain analyses.

bdpedigo commented 3 years ago

note that for the sake of this project im not trying to worry much about whether any of this code generalizes to another project, it is just too much overengineering to worry about and I think would slow us down.

tliu68 commented 3 years ago

Can we group the functions by something like necessary vs. optional for graph analysis and/or by the intended analysis procedures?

im not sure what you mean here. we can always have functions that call other functions such as .standard_preprocessing() that could do all of the stuff we normally think makes sense for certain analyses.

Yes, that's basically what I meant. So we were listing functions for individual processing steps and I was thinking that we can have something like you said .standard_preprocessing that calls for example all the necessary steps for basic graph construction so we don't have to call them one by one. Also, if some of the functions that are closely related and are necessary for the same analysis procedure say embedding then we can have another function like .preprocessing_for_embedding to "group" them together.

bdpedigo commented 3 years ago

yup, i definitely like that idea!

neurodata / maggot_connectome

code for data manipulation/MaggotGraph class #1