Adding very large Boolean networks

sybila / biodivine-boolean-models

A collection of 230+ Boolean networks from various sources useful for benchmarking or testing.

12 stars 2 forks source link

Adding very large Boolean networks #35

Closed pauleve closed 1 year ago

pauleve commented 2 years ago

For benchmarking the computation of attractors with M'ost Permissive updating mode with mpbn [1], I created a set of (random) very large BNs, i.e., 1,000 to 100,000 nodes, with in-degrees ranging up to 1400

The generated models are available at https://zenodo.org/record/3714876 and stored in textual bnet format.

[1] https://nbviewer.jupyter.org/urls/zenodo.org/record/3936123/files/Scalability%20on%20large%20random%20BNs.ipynb

Let me know if I should work on a PR, or if this is out of the scope of this repository :smiley:

daemontus commented 2 years ago

Sounds like a good fit :) At the moment, we mostly considered "real world" networks but we definitely want to include random ones as well (we also have some random benchmarks to include :D). There are just two things that need to be done on my part:

We should add a flag into the model source which indicate whether they are random or not, so that they can be filtered out if someone is interested only in the "real world" models.
I have to check if the metadata we compute (for example here) can scale to models of that size. It shouldn't be a problem, but there is a very naive SCC-decomposition algorithm for the regulatory graph used there, so that one I probably need to improve for 100K+ variable networks.

I'll let you know about this next week :)

daemontus commented 1 year ago

Returning to this issue... for now, I decided to avoid including randomly generated networks. This is mostly because it is not always clear how representative they are of the "real world" cases. Also, the parameters for the network generation are still a bit unclear to me (How many networks should be included if the supply is unlimited? What is the upper/lower bound on network size? What parameters for the generator do we prefer?)

However, I do see the benefit of having some collection of larger random networks for scalability and compliance testing. For now, I have included a reference to your dataset in the project README for anyone also interested in random networks.