microsoft / causica

MIT License
426 stars 54 forks source link

Parameter tuning for applying DECI to large graphs #60

Open fred887 opened 1 year ago

fred887 commented 1 year ago

Hello,

Do you have some pieces of advice for tuning the parameters of the DECI method when applied to large graphs for Causal Discovery?

I have tried to apply the DECI method to datasets of simulated graphs with 10, 20, 50 and 100 nodes (with nb edges equal or 4x number of nodes) and different types of nonlinear SEMs (but all with Gaussian Additive Noise).

For all datasets, the training seems to be going correctly (the loss curves are correctly decreasing, there is no numerical warning), so DECI seems to be converging.\ For all but 100-nodes graph datasets (and some 50-nodes graphs) I obtain valid graph estimates, more or less correct depending on the situation.\ For all 100-nodes graph datasets (and some 50-nodes graphs) I obtain invalid "empty" graphs (i.e. graphs with adjacency matrix made of only 0 elements).

Could you please help me in making DECI work for these 100-nodes graphs?

Here is my setting:

  1. Using the following snippet from the current gcastle package (v1.0.3), I have simulated several datasets of 3000 graphs with 100 nodes (and 100 or 400 edges) and different nonlinear SEMs (gp, quadratic and mlp):

    weighted_random_dag = DAG.erdos_renyi(n_nodes=n_nodes, n_edges=n_edges, weight_range=(0.5, 2.0), seed=seed)
    dataset = IIDSimulation(W=weighted_random_dag, n=n, method=method_type, sem_type=sem_type)
    true_dag, X = dataset.B, dataset.X
  2. I have adapted the source code from examples/multi_investment_sales_attribution.ipynb to process my own datasets. So I am using the default parameters + those specified in this example.\ I have only changed the batch size from 1024 to 128 to better fit my datasets containing 3000 samples.

Thank you very much for your help,

LaurantChao commented 1 year ago

Hi Fred,

Thanks for your detailed description of your question. My suspicion is that when the graph is relatively large, the scale of dagness penalty can get larger, and the updates for rho or alpha can blow up quickly; hence the optimization focuses only on producing a dag (which can be achieved naively with null graph) and ignore fitting the data. I would suggest to:

jiayang97 commented 12 months ago

I have encountered a similar issue when running the algorithm. I have tested it on 20 nodes + 10k data + batch size 100 + max epoch 1000. After a few epoch the alpha and rho increase drastically and i eventually get an 'nan no real values found' error. I have tried setting safety_alpha=0.0, safety_rho=0.0 and prior_sparsity_lambda = 0.1, but the problem persists. There are also cases where at the end of training, a non-valid DAG is generated. Do you have any recommendations on how I can troubleshoot it? I'm running on causia0.2.0 as I'm using python 3.9, I followed the code in the csuite example. Also, is it possible to know what the time complexity of the DECI algorithm is? Thank you so much!

fred887 commented 12 months ago

Hello,

First of all, thank you very much LaurantChao for your suggestions and sorry for my late answer. I applied the modifications you proposed and by removing completely the dagness penalty I could obtain non-empty graphs (though not valid, as expected). Changing the sparsity lambda had no impact on my initial problem. So there is something with the dagness penalty term that I have to investigate more deeply. (By the way, in my DECI version, the safety_rho and safety_alpha parameters belong to the AugLagLRConfig class...).

Next, jiayang97, my issue seems a little different from yours (my trainings are performed without any error message) but, if you want to completely remove the dagness penalty like me, you also need to set the parameter init_rho to 0. I hope this can help you.