lsils / mockturtle

C++ logic network library

MIT License

210 stars 139 forks source link

Is it necessary to call the cleanup_dangling function after calling the mig_algebraic_depth_rewriting operation? #511

Closed KamilDre closed 2 years ago

KamilDre commented 2 years ago

Hi,

I am implementing an algorithm that chains together a few operations for MIG graphs, and have noticed that after calling the mig_algebraic_depth_rewriting operation on a depth view of a MIG graph, I am not able to call the cut_rewriting operation on the original MIG graph (cut_rewriting uses the dsd_resynthesis resynthesis engine that uses direct_resynthesis as a fallback). In order to attempt to remedy this, I have tried calling the cleanup_dangling function on the original MIG graph, before calling the cut_rewriting function. This has solved my issue.

Now, after calling mig_algebraic_depth_rewriting, when examining the size and depth of the original MIG graph, before and after calling the cleanup_dangling function, I have observed that the size decreases while the depth remains the same. My question is: is this expected? I am asking because the Mockturtle documentation only specifies that the cleanup_dangling function should be called after the refactoring operation, and the documentation for Algebraic Rewriting does not mention that cleanup_dangling should be called after the operation at all.

Also, if this is the case and the cleanup_dangling should be called after mig_algebraic_depth_rewriting, I would like to also ask if it should also be called after any other operation applicable to MIGs except for refactoring? The operations that I am interested in in particular are: Cut rewriting, MIG algebraic rewriting, Resubstitution and Balancing.

Thanks a lot!

lee30sonia commented 2 years ago

Hello,

I'm not sure why mig_algebraic_depth_rewriting creates dangling nodes (I'm also curious though), and I don't have a clear answer to your last question either, but I can suggest a few ways to verify.

Always calling cleanup_dangling between algorithms is a good idea, although it may harm the total runtime (if this is a concern for you). cleanup_dangling basically re-constructs the entire network bottom-up, so it not only cleans up dangling and dead nodes, but also ensures that the network is in topological order.
There are several handy debugging utilities that can be used to check the network in debug mode. For example, you can assert( count_dangling_roots( ntk ) == 0 ); before calling cut_rewriting.

KamilDre commented 2 years ago

Hi,

Thank you for your reply.

Once you have pointed me to the debugging utilities, I have spent some time trying to understand which functions create dangling nodes by testing them on a few graphs. I have tested the balancing, mig_resubstitution, mig_resubstitution2, mig_algebraic_depth_rewriting (strategy = dfs, selective and aggressive), cut_rewriting (resynthesis function = mig_npn_resynthesis, akers_resynthesis, direct_resynthesis, dsd_resynthesis, shannon_resynthesis, positive_davio_resynthesis and negative_davio_resynthesis) and refactoring (resynthesis function = mig_npn_resynthesis, akers_resynthesis, direct_resynthesis, dsd_resynthesis, shannon_resynthesis, positive_davio_resynthesis, negative_davio_resynthesis and bidecomposition_resynthesis). I have found that mig_algebraic_depth_rewriting (all strategies), mig_resubstitution, mig_resubstitution2 and refactoring (all resynthesis functions) create dangling and dead nodes, and hence, require a call of cleanup_dangling after the function call itself.

My investigation has also raised some questions which I am not clear about:

(1) I am trying to understand the difference between a dead node and a dangling root?

(2) Some graphs that I load have a significant number of dangling nodes before I apply any operation to them. Does this ever happen? If not, does it mean that there is something wrong with some of my input graphs?

(3) For mig_algebraic_depth_rewriting, mig_resubstitution and mig_resubstitution2, the number of dead nodes and dangling roots appears to be the same after the operation. However, for refactoring, there always appears to be more dangling roots than dead nodes. Why is this, and is this expected?

(4) Given that mig_algebraic_depth_rewriting creates dangling nodes, and this is not expected, is there a way to verify that the input MIG is equivalent to the output?

Now, the following questions only apply to the refactoring operation with the direct_resynthesis resynthesis function.

(5) After this operation, the cleanup_dangling function does not remove all of the dangling nodes for some graphs. Why is this the case?

(6) I have examined the size of a graph before and after calling the cleanup_dangling function. I have noticed that the

total size after + the number of dead nodes before + the number of dangling roots before < total size before

Is this expected? If so, what else is omitted when creating the new graph with cleanup_dangling?

(7) For some graphs, I cannot call the cleanup_dangling function after the refactoring operation with the direct_resynthesis resynthesis function. When I do, I get a "Signal: SIGSEGV (segmentation fault)" error message. When looking deeper into this, this error message arises when creating a topological view of the input graph. Is there anything I can do about this?

lee30sonia commented 2 years ago

Thank you for the detailed investigation and for pointing out potential bugs. For your questions:

(1) A node is marked as dead when its fanout count decreases to zero. Dead nodes are kept in the storage (their index will not be reused) but deleted from the hash table. They are included in ntk.size() (because it returns the max index), but excluded in ntk.num_gates() (because it returns the hash table size). They are skipped when iterating with foreach_node or foreach_gate. Note that if a node is just created but never referenced, it is not dead but has a fanout size of zero because its fanout count never "decreases" to zero. A node is dangling if it is not reachable from any primary output. A dangling root is a node with fanout size zero (i.e. there is not another node having it as fanin, nor a primary output referencing to it). Note that there can be nodes that are dangling but have non-zero fanout size -- for example, a node whose only fanout is a dangling root.

(2) If we have understood the above definitions correctly, then:

If you create a network with lots of gates but never create a PO, then, at this point, all nodes are not dead, but are dangling.
When reading-in a benchmark file using lorina, it simply parses the file and calls the corresponding network creation APIs in mockturtle. So, it is possible to write a, say Verilog file, that creates dangling nodes.
The benchmarks we use usually do not include dangling nodes.
Please provide a testcase if you think there is a problem in mockturtle.

(3) Based on the above definitions and observations, if an algorithm creates gates but does not use them, then there could be dangling-but-not-dead roots. I am not the author of refactoring, but if needed, I can try to look into it. If you could provide a testcase, it would be helpful.

(4) There is equivalence_checking in mockturtle which can be used for smaller benchmarks (it is likely slow if the benchmark is big). Alternatively, you could write out the resulting network in, for example, Verilog format (using write_verilog) and then use another tool, e.g. ABC to do equivalence checking. Note that dangling nodes should not contribute to the primary output functions, and should not be included when writing out. Also note that it could be normal that an optimization algorithm produces dangling nodes -- in fact, this should happen as soon as substitute_node is called -- but they should all be marked as dead. The procedure of "killing" nodes (take_out_node) is recursive, meaning that when a node is killed, the fanout size of its fanins is decreased, and if it becomes zero, its fanin is killed as well, and so on.

(5) Do you mean that there are still nodes having fanout size zero (dangling root) and/or not reachable from any PO (dangling node)? If so, it does not sound right. Please provide a testcase/reproduce steps for us to investigate.

(6) After the explanations above, I think now we understand that:

Dangling roots should usually be dead (but could also not be marked as dead if it never got referenced).
Dangling nodes != dangling roots, and dangling nodes could be normally alive, with fanouts.

Thus, your calculation does not make much sense to me.

(7) Please provide a testcase if you think there is a bug in mockturtle.

KamilDre commented 2 years ago

Thank you for the very detailed reply! I am preparing test case scripts for all the things that I have encountered. During this process, I wanted to check if two networks are equivalent using equivalence_checking from Mockturtle. I have literally copied and pasted the example code from https://mockturtle.readthedocs.io/en/latest/algorithms/equivalence_checking.html, with the exception that I load some graph in the second line. However, when I try to run the code, I get the following error:

""" /path/filename.cpp: In function â€˜int main()â€™: /path/filename.cpp:29:23: error: use of â€˜miterâ€™ before deduction of â€˜autoâ€™ auto const miter = miter( orig, aig ); ^~~~~ /path/filename.cpp:29:40: error: expected primary-expression before â€˜>â€™ token auto const miter = miter( orig, aig ); ^ /path/filename.cpp:29:49: warning: left operand of comma operator has no effect [-Wunused-value] auto const miter = *miter( orig, aig ); ^~~ """

Do you maybe know why this error arises and what I can do about it so that I can finish preparing the test case scripts?

Edit: I believe that I have now figured out what was causing the error - In the example, we have const auto miter = *miter<aig_network>( orig, aig );, i.e. miter appears on the left and right of the equation and this was causing the error. There is no error after changing miter on the left-hand side to any other name. In this case, I'll get back to preparing the test case scripts.

KamilDre commented 2 years ago

Hi,

Once again, thank you for your previous answer and apologies for the delay. I needed some extra time to find commonly used circuits that expose what I have mentioned to you. I am also attaching two cpp scripts that illustrate what I have meant in some of my questions. Both of these scripts use a circuit from the arithmetic benchmark dataset found at https://github.com/lsils/benchmarks/tree/master/arithmetic.

Let us start with the first script and all questions related to this script. The name of this script is "not_all_dangling_removed.cpp", and it uses the sqrt.aig graph that I have also attached. This script loads the graph, cleans up all dangling nodes, calls the refactoring function, and finally cleans up all dangling nodes. I have attached a screenshot of the output of my console when I run this script. The relevant questions to this script are (1), (3) and (5). Let me explain them in order:

(1) Thank you very much for clarifying this. So is my understanding correct that an algorithm such as refactoring can create dangling roots that are not dead nodes (i.e. nodes that are created with fanout equal to zero)? As this script illustrates, the number of dangling roots after the refactoring operation is 16368, while the number of dead nodes is 16365. I am just wondering if this is the expected behaviour that the refactoring algorithm sometimes creates nodes with no fanout?

(3) In line 33 of this script, we can change the graph representation to which sqrt.aig will be loaded to. I have left the default as aig_network as when this graph is loaded to this representation, it has a single dangling root. Moreover, if we call the cleanup_dangling function on the loaded AIG graph, it does not remove the dangling root. Now, if we change the graph representation to mig_network in line 33, the loaded graph has zero dangling roots. I just found this interesting and am wondering if this is also something I can expect.

(5) After loading the graph and calling the cleanup_dangling function, this script then calls refactoring with the direct_rysynthesis resynthesis function. After the refactoring operation, we see that the resultant graph has 16368 dangling roots. If we then call the cleanup_dangling function on this graph, we are left with 3 dangling roots, hence illustrating that not all dangling roots have been removed. Similar behaviour is observed if we change the representation type to mig_network in line 33.

The second script is called "cant_create_topo_view.cpp" and it uses the sin.aig graph that I have also attached. This script loads the graph, applies the balancing operation, applies the cut_rewriting operation, applies the refactoring operation, and finally attempts to create a topo_view of the final graph. Note that it was very hard for me to recreate this behavior using the sin.aig graph, and it only arises for very particular choices of operation parameters and sequences for this graph. I have attached to this reply a screenshot of a part of the output of my console when I have run this script in debugging mode (the entire output is to long to be displayed on a single line).

(7) This script illustrates that sometimes it is not possible to create a topo_view of a graph after calling the refactoring operation that uses the direct_rysynthesis function. For some graphs, this is also the case after directly loading the graph and applying refactoring with the direct_rysynthesis function (I am sorry, but I am unable to share these graphs). In the script I have sent you, I had to apply some other operations before the refactoring operation to recreate this error. Note that if we change the rysynthesis function for cut_rewriting, the error also disappears.

Thank you for your time and I look forward to your reply.

mockturtle_investigation.zip

lee30sonia commented 2 years ago

Hi, Thanks for the materials. I will look into them later. However, I am a bit busy this month and cannot guarantee a schedule on when to get back to you. I hope it is okay and thank you for understanding.

lee30sonia commented 2 years ago

Hi, sorry for the delay.

For the case not_all_dangling_removed, it is actually quite simple: Before refactoring, the (only) dangling root is the constant node (ID 0). The reason why it only appears in aig_network and disappears when using mig_network is that AIGs never take constant(s) as gate fanin, so unless the network has a constant PO, the constant node is dangling; whereas MIGs often use constant(s) as gate fanin to create AND/OR functions with the MAJ gate. After refactoring, the 3 alive (not-dead) dangling roots are the constant node (ID 0) and two PIs (ID 1 and 2). So, it appears that none of the network PO functions actually depend on these two PIs, so their output logic is optimized out and they become dangling. All of the other dangling roots are dead and are removed after cleanup_dangling.

Note that the constant node, as well as PI nodes, will never be removed from the network (nor marked as dead) even if they have no fanouts. FYI, I added the following print-out in line 103 of debugging_utils when investigating this case, which might be helpful for you to understand your other cases if any:

std::cout << "detected " << ( ntk.is_dead(n) ? "dead" : "alive" ) << " dangling root " << n << ( ntk.is_pi(n) ? " (PI)\n" : "\n" );

For cant_create_topo_view, it seems indeed a bug in refactoring. So far, I can only point out that refactoring makes the network cyclic and thus triggers the assertion failure when creating topo_view. (FYI, I used the function network_is_acylic in debugging_utils to confirm this.) Further investigation will be needed to fix the bug. Thanks for spotting it!

lee30sonia commented 2 years ago

521 fixes the case you provided on my platform. To be honest, I'm not sure if there are other bugs. However, I would actually suggest using resubstitution instead of refactoring, as it should subsume and should be more powerful than refactoring in theory.

Please let me know if there are further questions/bug reports.

KamilDre commented 2 years ago

Hi,

Thank you for your detailed reply and your time taken to investigating this.

Before refactoring, the (only) dangling root is the constant node (ID 0). The reason why it only appears in aig_network and disappears when using mig_network is that AIGs never take constant(s) as gate fanin, so unless the network has a constant PO, the constant node is dangling; whereas MIGs often use constant(s) as gate fanin to create AND/OR functions with the MAJ gate.

I am still struggling to understand why the constant node (ID 0) is present if it is not used as a fanin by any other gate, nor does the network have a constant PO?

521 fixes the case you provided on my platform. To be honest, I'm not sure if there are other bugs. However, I would actually suggest using resubstitution instead of refactoring, as it should subsume and should be more powerful than refactoring in theory.

Thank you very much for fixing this. I will test this function later and let you know if I spot anything. And to be honest, from my experiments, I have found that refactoring with the direct resynthesis engine was more powerful than resubstitution for the MIG representation.

lee30sonia commented 2 years ago

I am still struggling to understand why the constant node (ID 0) is present if it is not used as a fanin by any other gate, nor does the network have a constant PO?

This is by design, specified in the network interface API. For the convenience of many algorithms and network manipulation and for the consistency of input/output interfaces, it is common to use the first node (index 0) for the constant(s) and the following nodes for the PIs and never delete or kill them.

Thank you very much for fixing this. I will test this function later and let you know if I spot anything. And to be honest, from my experiments, I have found that refactoring with the direct resynthesis engine was more powerful than resubstitution for the MIG representation.

This doesn't sound right to me. Did you check the equivalence of the resulting network? Given the above bug, it is very likely that the algorithm was substituting nodes randomly when nothing could be resynthesized (See #519). Thus, it could give seemingly very good results if you only check for network sizes, but they are actually wrong.

lee30sonia commented 2 years ago

I assume the major questions were answered and the issue is resolved.

KamilDre commented 2 years ago

Hi,

This doesn't sound right to me. Did you check the equivalence of the resulting network? Given the above bug, it is very likely that the algorithm was substituting nodes randomly when nothing could be resynthesized (See #519). Thus, it could give seemingly very good results if you only check for network sizes, but they are actually wrong.

In fact, I have not checked this. So you are probably right in that I was observing good but meaningless results. I will have a look at the performance of the corrected function later. Once again, thanks for all the help