A Python 3 package for learning Bayesian Networks (DAGs) from data. Official implementation of the paper "DAGMA: Learning DAGs via M-matrices and a Log-Determinant Acyclicity Characterization"
I applied my real-world dataset to both linear and nonlinear models. The size of the dataset is about 0.4 m rows and 23 columns. I have the following questions about the selection of T regarding linear and nonlinear models.
1) For linear models, I tried to set a T larger than the default. Then I got an early stop. Can I consider this one the final result? Because I found that this matrix is slightly different from the one from the algorithm with a smaller T.
2) For the nonlinear model, I found that the values in the matrix are quite small for T=4, the scale was about 1e-5. When I increased T, they were getting smaller, of which the scale could be 1e-19 as T=15. Does that mean a nonlinear model is not proper for this dataset, indicating the values tend to converge to 0? Note that I managed to adapt the nonlinear algorithm on GPU for a shorter running time. So, I slightly modified the codes to make it. This modification does not change the main processes of the algorithm.
The algorithm should not stop. It should run for all specified T, each iteration for T can terminate if the loss does not decrease significantly. If the algorithm terminates before running for all T, then it could be that the solution matrix went outside of the domain of M-matrices.
There could be several reason, have you tried decreasing the values for l1 and l2? You could quickly check setting both to 0 and see if you still get very small entries.
Hi there, thank you for this efficient algorithm!
I applied my real-world dataset to both linear and nonlinear models. The size of the dataset is about 0.4 m rows and 23 columns. I have the following questions about the selection of T regarding linear and nonlinear models.
1) For linear models, I tried to set a T larger than the default. Then I got an early stop. Can I consider this one the final result? Because I found that this matrix is slightly different from the one from the algorithm with a smaller T.
2) For the nonlinear model, I found that the values in the matrix are quite small for T=4, the scale was about 1e-5. When I increased T, they were getting smaller, of which the scale could be 1e-19 as T=15. Does that mean a nonlinear model is not proper for this dataset, indicating the values tend to converge to 0? Note that I managed to adapt the nonlinear algorithm on GPU for a shorter running time. So, I slightly modified the codes to make it. This modification does not change the main processes of the algorithm.
Kind Regards, Weikang