Workflow is getting too big for exploration

skojaku / degree-corrected-link-prediction-benchmark

Link prediction

MIT License

3 stars 0 forks source link

Workflow is getting too big for exploration #28

Closed skojaku closed 1 year ago

skojaku commented 1 year ago

Since we have more networks and algorithms, I feel that the workflow is getting too big for exploration & bug fixes.

Shall we focus on small networks (less than 3000 nodes?) for the time being? Once we finish the analysis & bug fixes, we can include larger networks.

yy commented 1 year ago

I'd keep at least a couple of biggish networks in the workflow. Often, some phenomena/properties can only be observed when the network is large enough.

rachithaiyappa commented 1 year ago

I'm okay with this.

I've been using networks with about 4k/5k nodes and less than 50k edges for my exploration stuff (on a separate branch). It is just a single line of change in the workflow file in DATA_LIST.

In any case, I dont think the optimal stacking pipeline (which at the moment runs more or less independently of embedding and network based pipeline) can be scaled to networks larger than 50k edges unless we drop some features #15

skojaku commented 1 year ago

OK. So let's keep the main workflow as it is. I'll expand the test data under "test_data" folder for testing purpose by increasing the number of networks from two to ten, with size upto 5k.