Closed jgtz closed 3 years ago
You are probably right about the lines causing the error. You could try adding simplify=False
to the on line 256 of Reduction.py. That's a temporary fix at best. I'll think of ways around it. I don't think we really need to compute the string representation anyway.
I'm running into a different OS specific error on my end. PyBoolNet always checks to see if your input is pointing to a file, but if the input is too long, it causes a crash in Windows. These issues taken together suggest that we should really try to avoid espresso passes.
Regarding the MaxOutput, maybe this should be an optional argument for the diagram builder.
I've added a new branch to play around with the rspace representations. It seems to work so far.
I'm still running into problems with your example because the deletion reduction is taking way too long to compute. It gets slower as we increase the in-degree of deleted nodes or if the deleted nodes appear in a complicated way. I'll see if I can find a way to speed it up.
I added the MaxOutput argument (it's called max_stable_motifs for us). Still having trouble with the K=3,N=1000 example though. I've sped up some of the other examples by quite a bit, but I think the performance of the VC method on relatively dense networks is inherently bad. Maybe we can think of a way to do some of the work directly on the expanded network to save time.
I had 2 out of 10 of the K=3, N=1000 examples give me trouble and took too long (stuck in the reduction step, like you say). I agree with you that working in the expanded network would be the ideal, but would require some extra figuring out. Maybe an intermediate alternative that would not require extra figuring out is to work on the wiring diagram as much as possible, and only send it to the VC method once that is no longer possible.
So:
This way we would only do a VC reduction every couple of times a new functional self-loop is removed. We would also do multiple node reductions at once instead of one node at a time (which is what is done, right?).
What do you think?
I'm not sure I'm following you. The restrictions in step 1 are more than just self-loops. For example, A=B, B=A will not allow us to reduce by deleting both A and B. Maybe we can make it work by looking at the relationship between successors and predecessors of selected nodes, but I don't see how deleting several nodes at once gives us any speedup.
I'm 99% sure that the "slow" step in the VC reduction is simplifying the rules you get after plugging in f_X for X when you delete X. I'm wondering if we can speed things up by doing this pseudo-algebraically (i.e., with the expanded network directly).
I am going to try using sympy to speed things up. Not sure if it will work, but it's worth a shot.
I agree with you on what the slow step is. The "deleting several nodes at once" would allow us to simplify the rules less times (once instead of once per node deleted). I do agree that there should be a faster way using the expanded network. What I was trying to propose is to reduce the number of times that rule simplification happens taking advantage of the graph structure. Let me see if I can quickly implement what I am thinking,
Sounds good. Let me know what you find.
By the way, I think a big part of the problem is that PyBoolNet recalculates truth tables when you import rules after plugging things in. But we already have partial information about the DNF and the DNF of the negations before plugging in the f_X after deletion. Maybe we can build our own importer that can take these hints and can avoid passes to espresso altogether.
The idea would be to use sympy to simplify the rules and their negations seperately, then pass those directly to our custom importer, which would just read off the minterms. I'll work on this angle and we'll combine whatever progress we make later.
The sympy approach seems to be working well. I only ran it on the example above, but it finished the succession diagram in ~100 seconds. Much better than before! I was never patient enough to wait for it to finish before, so I don't know how much better.
I pushed this to the rspace-representation-tweaks branch (I probably should have started a separate branch, but it's too late).
Great! I was also never patient enough to let it finish. I probably waited 10 mins at some point, so I think we can safely say it was a >10x improvement for that case. Currently, the "rbn_ensemble[7]" case is the only one taking a while too finish (although it does keep outputting things to console).
I also profiled the "rbn_ensemble[4]" case (now that it finished). It does seem like the rule simplification part is the one taking the bulk of the simulation:
ncalls tottime percall cumtime percall filename:lineno(function)
752 0.080 0.000 254.863 0.339 Reduction.py:48(delete_node)
2202 0.032 0.000 218.973 0.099 Reduction.py:93(simplify_using_expression_and_negation)
2 0.071 0.035 216.911 108.456 Reduction.py:117(deletion_reduction)
You can see the whole profiler info for >30 seconds cumulative below
Fri Apr 24 16:24:53 2020 restats
383758577 function calls (372192620 primitive calls) in 269.846 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
2203/1 0.084 0.000 270.500 270.500 {built-in method builtins.exec}
1 0.002 0.002 270.354 270.354 <ipython-input-21-4c174d0693c7>:44(testf3)
2/1 0.003 0.001 269.375 269.375 Succession.py:572(build_succession_diagram)
2 0.006 0.003 269.171 134.585 Reduction.py:220(__init__)
752 0.080 0.000 254.863 0.339 Reduction.py:48(delete_node)
2202 0.032 0.000 218.973 0.099 Reduction.py:93(simplify_using_expression_and_negation)
2 0.071 0.035 216.911 108.456 Reduction.py:117(deletion_reduction)
2124 0.011 0.000 193.247 0.091 boolalg.py:436(simplify)
35298/2202 0.191 0.000 193.232 0.088 simplify.py:393(simplify)
3463/1664 0.023 0.000 191.161 0.115 boolalg.py:847(_eval_simplify)
17419/2124 0.101 0.000 151.410 0.071 boolalg.py:431(_eval_simplify)
484746 2.365 0.000 134.240 0.000 operations.py:410(__new__)
2674 0.286 0.000 100.910 0.038 boolalg.py:2581(simplify_patterns_or)
17419 0.739 0.000 89.605 0.005 boolalg.py:2318(simplify_logic)
17419/2124 0.074 0.000 86.205 0.041 boolalg.py:433(<listcomp>)
205208 1.323 0.000 85.964 0.000 boolalg.py:803(_new_args_filter)
4889/4158 0.043 0.000 81.378 0.020 boolalg.py:702(_eval_simplify)
495442 2.466 0.000 59.524 0.000 boolalg.py:453(binary_check_and_simplify)
213920 1.848 0.000 55.926 0.000 relational.py:189(canonical)
166997 0.140 0.000 45.360 0.000 basic.py:1130(xreplace)
1533315/166997 3.929 0.000 45.220 0.000 basic.py:1195(_xreplace)
2 0.038 0.019 39.009 19.505 Reduction.py:149(mediator_reduction)
2322 34.399 0.015 36.257 0.016 InteractionGraphs.py:42(primes2igraph)
279538 1.264 0.000 35.115 0.000 boolalg.py:665(_new_args_filter)
32442 9.573 0.000 34.108 0.001 boolalg.py:2035(_simplified_pairs)
4 0.034 0.008 33.522 8.381 Reduction.py:102(remove_outdag)
427840 1.041 0.000 32.990 0.000 expr.py:2364(could_extract_minus_sign)
An update on this part. My effort to do a "deleting several nodes at once" approach seems to have failed miserably. Either it (1) generates an insanely long Boolean function that takes too long to simplify or (2) if done step-by-step but taking advantage of the induced topological order, seems to take longer than what we have. I might try it tomorrow, but this seems like a dead end for now.
The rspace-representation-tweaks branch has been merged with the main branch for some time now. More belated issue closing here.
When calling the ASP solver from PyBoolNet, there seems to be a threshold for the max number of solutions that it will output.
https://github.com/hklarner/PyBoolNet/blob/master/PyBoolNet/AspSolver.py
We probably want to remove that (or have some warning telling us that this is where it might get stuck). It also seems to be behind an error.
Code
I looked at the error and it first calls this ASP threshold, but then also gives an error
I looked at the function just before the error (attached, it is very large, I added a line jump after each closing parenthesis CNF_error.txt ). I noticed that it is a conjunctive normal form, except in the first few terms:
Are these maybe the ones causing the error?