seahorn / crab

A library for building abstract interpretation-based analyses
Apache License 2.0
233 stars 32 forks source link

Reaching Definitions Analysis #33

Closed MarcoGrochowski closed 4 years ago

MarcoGrochowski commented 4 years ago

Hello, is it possible to add a reaching definitions analysis to this framework to obtain a data dependence graph? I am interested in obtaining a program dependence graph of a control flow graph, however, currently only the control dependence graph is obtainable.

caballa commented 4 years ago

Yes, you should be able to implement that dataflow analysis without much effort. As an example you can look at the code of the liveness analysis. Just for curiosity, why do you need reaching definitions? Coming from LLVM bitcode we never had the need for doing that.

MarcoGrochowski commented 4 years ago

Hello caballa, thank you for your response! I skimmed the code and implemented the reaching definitions analysis using the same varset_domain as in liveness.hpp. It does work for an intraprocedural example, however, I am interested in the respective basic blocks not the variables. If I use the default basic_block_label_t a compile error is thrown in write(crab::crab_os &o) of patricia_trees.hpp because it uses it->write(o); which is not defined on std::string. If I create a wrapping class inheriting from writeable and overload the crab::cfg_impl::get_label_str(T e) I get even more compile errors, boiling down to my class not being derived from ikos::XXX. Do you have a small example at hand how I can use crab::domains::flat_killgen_domain; with V being my basic_block_label_t?

Just for curiosity, why do you need reaching definitions? I am currently tinkering with regression testing for a language similar to PASCAL. The test cases are generated using some kind of coverage criterion, either structural such as branch coverage or property-based, by converting the program under test into a set of horn clauses and quering z3 spacer for a counter-example which serves as the witness for testing. I thought the program dependence graph could help me in order to identify which parts of the code are affected by a change, such that I can identify obsolete test cases and regenerate the test cases affected by the change. I am still reading literature on this topic, so I can not really tell if that is the correct way to do it.

caballa commented 4 years ago

I looked at the code and all examples use variable as the template parameter for flat_killgen_domain. But you were going in the right direction. You just need a wrapper that implements two operations index_t index() const and void write(crab_os &) const.

Btw, I strongly recommend you to use the branch dev. Apart from the fact that it's the development branch and we plan to make it master really soon, there are new features that might help you. I would still perform the dataflow analysis using a crab variable. Then I would keep separately a map from definitions (variables) to crab statements. From there, a crab statement has a new method get_parent() that gives you the basic block.

MarcoGrochowski commented 4 years ago

Hello caballa, I appreciate your input. I decided against wrapping the basic_block_label_t in a class deriving from indexable. I also switched the branch to dev, thank you for the hint. Using a map from variables to statements is not straightforward as basic_block returns references to statement_t, which is an abstract class. I could not bother to type-cast every statement_t into its respective type using either the m_stmt_code or a custom statement_visitor, so I decided to use a variable to basic block map. This allows me to compute kill and gen maps inside the analyze function and derive an out map relating each variable to the respective reaching definition of the corresponding basic block. The computed out map will be used as the in map for the next call to analyze. However, this obviously does not work in the general case. What breaks my neck is the merge(varset_domain_t d1, varset_domain_t d2)function and handling in and out maps with regards to varset_domain_t. I simply can not find a way to control this information on the reaching_definitions_analysis_operations layer -- even though it is possible to map from basic_block_label_t to varset_domain_t and retrieve the corresponding in and out maps, I never know whether I am merging information before analyze or afterwards. One could use a boolean flag to indicate whether analyze was called to be certain that merge is called on the in or out map, but then again it makes it even more hacky :/ I think my best bet would be to create a basic block label factory and use the indexed strings as the domain for the killgen_api. That way, I do not need to keep auxillary maps for conversion and the merging shouldn't be a problem either. What do you think?

caballa commented 4 years ago

Something quick, sorry I don't have much time this morning. The lattice for reaching definitions is an environment (i.e., map) from variables to a set of definitions. A definition typically is identified as the instruction where that definition happens (probably a pointer to statement_t is a good choice in your case). This is the textbook way. I think what you are trying to do is far too complicated. You mention that it's a problem for you that statement_t is an abstract class, why? Note two things about statement_t

  1. the statement_t class has an ID that tells which statement actually is so you can always do static casts so performance is not an issue. You can also use the visitor class as you mentioned.
  2. the statement_t class has two key methods for you: defs() and uses() that says which variables are defined and used. So I don't think you really need to cast statement_t to their derived classes. You can use defs() to know which variables are defined by the statement. I think typically is only one definition at most but calls can return multiple outputs so they can define multiple variables.