Closed skojaku closed 1 year ago
I'd keep at least a couple of biggish networks in the workflow. Often, some phenomena/properties can only be observed when the network is large enough.
I'm okay with this.
I've been using networks with about 4k/5k nodes and less than 50k edges for my exploration stuff (on a separate branch). It is just a single line of change in the workflow file in DATA_LIST.
In any case, I dont think the optimal stacking pipeline (which at the moment runs more or less independently of embedding and network based pipeline) can be scaled to networks larger than 50k edges unless we drop some features #15
OK. So let's keep the main workflow as it is. I'll expand the test data under "test_data" folder for testing purpose by increasing the number of networks from two to ten, with size upto 5k.
Since we have more networks and algorithms, I feel that the workflow is getting too big for exploration & bug fixes.
Shall we focus on small networks (less than 3000 nodes?) for the time being? Once we finish the analysis & bug fixes, we can include larger networks.