Open johannesduesing opened 8 hours ago
Sadly I don't have much insight into the abstract interpretation code. Can you give me some hints where that ReferenceValue method is called from? If that method really leads to this, that seems strange. In that case, this would also produce an InitializedObjectValue
, right? And that would contradict the method's name.
Difficult to say where the underlying problem is. Could be a mixin issue with the two domains, but could also be wrong implementation in either. Especially org.opalj.ai.domain.l0.TypeCheckingDomain
looks strange: The ArrayValue
method returns a (potentially-null) DefaultArrayValue
, but the related ObjectValue
methods returns a (non-null) InitializedObjectValue
. The other ObjectValue
method returns a (potentially-null) DefaultMObjectValue
only if the upperTypeBound
has multiple types, but a (non-null) InitializedObjectValue
(via the ObjectValue
method), which seems strange since I don't think a singleton upperTypeBound
would exclude null either.
Sure i can try to give you some context on this:
AI.scala
handles field accesse by calling theDomain.getfield
to obtain a domain valueai.domain.l0.TypeLevelFieldAccessIntructions.getfield([...])
which in turn looks up a domain value for the field by calling TypedValue(pc, fieldType)
TypedValuesFactory.TypedValue
and selects the case for reference valuesReferenceValue
method we were talking about.I again confirmed via debugging that this in fact calls TypeCheckingDomain.ObjectValue
and leads to an initialized object value, thus no field can ever be nullable (in this mix-in configuration for TypeCheckingDomain
). So to me this seems like an implementation issue in TypeCheckingDomain
, right? Unfortunately i don't have any idea how fields are usually handled in AI, but this seems factually wrong. This is an interesting quote from the code-comments in TypedValueFactory.TypedValue
:
/**
* Factory method to create domain values with a specific type. I.e., values for
* which we have some type information but no precise value or source information.
* However, the value is guaranteed to be `null` or properly initialized.
*
* For example, if `valueType` is a reference type it may be possible
* that the actual value is `null`, but such knowledge is not available.
*
* The framework uses this method when a method is to be analyzed, but no parameter
* values are given and initial values need to be generated. This method is not
* used elsewhere by the framework.
*/
This comment seems to specifically acknowledge that value can be null here, but the actual implementation in TypecheckingDomain
contradicts this.
Note that the code we're discussing about is from this commit and unchanged since then. Interestingly, here, code in TypeLevelReferenceValues
that used to produce an InitializedObjectValue
was changed to ObjectValue
which then, via TypeCheckingDomain
ends up being an InitializedObjectValue
again. But, at the time, that did not have isNull = No
(at least not explicitly). That only got introduced in this commit, which, interestingly, is about fields.
Might still be wrong or point to wrong mixing in of some field related domain to handle this first/differently for fields?
During the current iteration of our project course on static program analysis, one of our students discovered what i believe is a bug in abstract interpretation. The students were working on a task where they had to change bytecode of certain methods, and then write the modified bytecode to files. I know there is multiple ways to do this, but one way that seems particularly suited is the
org.opalj.ba.LabeledCode
builder. However, students discovered that the simple round-trip of reading code and writing it again without modifications usingLabeledCode
does not work for all methods of real-world projects. For example, it fails on 4% of all methods from Apaches PDFBox 3.0.3 (link to jar). Here is an example that exercises a specific method for which the roundtrip fails:The exception that is thrown is very long and very detailed (pdfbox-error.txt), but boils down to the following:
StackMapTable
for the code (see here)forward
) can very well benull
.CodeAttributeBuilder
uses theorg.opalj.ai.domain.l0.TypeCheckingDomain
to detect dead code via Abstract Interpretation. For some reason, this domain believes that fields of a class are always initialized and can never be null (at least that's what i got from some detailed debugging through the above-mentioned example). Specifically, this method is called to generate a domain value for a field access, which results in a call to this method returning anInitializedObjectValue
.@errt do you have any insights into this? Is the
LabeledCode
object intended to be used this way, did we do something not supported or is this actually a bug? If so, i'd look into it in the coming days.Credit to Finn for reporting this issue 👍