sybila / biodivine-boolean-models

A collection of 230+ Boolean networks from various sources useful for benchmarking or testing.
12 stars 2 forks source link

Add index for discrete models that were turned boolean. #48

Closed gmagannaDevelop closed 1 year ago

gmagannaDevelop commented 2 years ago

Hi @daemontus, I hope you are doing well.

I am currently working on describing and characterizing "real world" boolean models in order to produce "realistic" synthetic ones.

For the first part of my work we would like to exclude any multi-valued network that has been translated into a boolean one.

I was wondering if you have any information on which models where originally boolean and which ones were adapted, so that I don't have to filter them by hand.

Thanks in advance.

daemontus commented 2 years ago

Hi Gustavo! Thanks for reaching out!

So far, I haven't included any translation from multi-valued networks in this repository. Hence, as far as I know, everything here should be originally published as a Boolean network.

There might be some cases where authors started with a multi-valued network but eventually somehow transformed it into a Boolean one before publishing, but I don't really have any hard data on this. I would have to probably go through all the papers associated with the models to find out. Nevertheless, from what I have seen, I think it is safe to say that this would be at least very uncommon. I can't recall any specific network that was constructed like this, I just think that its a possibility.

There is a desire to eventually incorporate automatic translation of multi-valued networks, but I haven't really had time to finish this. I'd like to first publish a detailed (cite-able :D) report about the benchmark collection and then start adding new functionality (I am doing a substantial clean-up and reorg in one of the branches). So at least for a while, there won't be multi-valued networks here. Once they are supported, I think every model will have "tags" so that you can easily filter out human-curated/synthetic/multi-valued networks and so on.

From the perspective of your inquiry, I think an important thing to consider is that some models here are human-made and some are automatic translations using tools like CaSQ (but these usually still have at least some level of human sanity check as far as I know). This may perhaps influence their properties slightly (I haven't quantified this, but it seems to me that the automatically translated ones tend to be more sparse).

Similarly, I haven't really checked for duplicates very thoroughly. No two networks should be syntactical duplicates, but some may be very similar. For example, there are some models that are both in CellCollective and GINsim databases but with slightly different update functions or with some extra/missing input/output nodes. I don't think this is a very common issue either, just keep in mind that you may run into some very similar networks.

daemontus commented 1 year ago

Hi!

For anyone still interested in this, I have just "released" a revised version of the dataset. Now it also includes models Booleanized using GINsim (which increased the model count significantly). Additionally, I tried to eliminate any obvious duplicates by cross-referencing the associated publications. The result may still contain some very similar models, but at the very least there shouldn't be multiple versions of a single model from the same publication.

A new feature that was added is that now each each model has an associated list of "keywords". If the network is known to be based on a multi-valued model, the keywords list should contain a "multi-valued" entry. Using the bundle.py script, you can then filter the networks based these keywords to avoid the Booleanized models if you need to.

With this update, I would consider this as resolved :)

gmagannaDevelop commented 1 year ago

This is indeed resolved, thank you very much Samuel.

I am looking forward to analyse these models.