Closed skyleaworlder closed 1 year ago
Hello!
Thanks for this tool and the PR about language-agnostic layer.
This issue is not going to report some bugs and features. I just wanna get answers to my a few questions, which are mainly about infersharp implementation.
- Why
MethodBody
can be found everywhere in cfg.json? And what's the logic of it? Perhaps I fail to figure out whatMethodBody
means? (e.g Why LdargParser generatesMethodBody
node, while LdcParser doesn't.)
MethodBody
type nodes refer to nodes in the Control-Flow-Graph (CFG) which contain instructions representing the body of the method. As for your second question: there is no guarantee that a particular CIL instruction has a 1:1 mapping to a particular SIL instruction. In this case, there is an SIL instruction corresponding to ldarg
because ldarg
is the pushing of an argument value onto the program stack; in other words, this is the loading of a value of a location on the heap, or an SIL Load
instruction applied to the method argument. On the other hand, ldc
, or the pushing of a constant onto the stack, doesn't have a corresponding SIL representation because it has no association with locations on the heap.
- Is
Metadata
instruction useful? I wanna know the reason infersharp doesn't implement it.
Well, we do use metadata information insofar as we do use complete method signatures (including method name, class name, method arguments) to identify translated methods. From a control flow representation perspective, CIL Metadata doesn't really have much relevance as it does not actually convey information about a program's control flow.
- The point I mostly concerned is whether infer can run some language-unrelated checkers with data (cfg.json & tenv.json) received from language-agnostic layer interface.
Depending on the checkers you have in mind, you might want to look at Infer's taint analysis or other checkers they have implemented on top of the SIL, to save yourself work.
Many thanks!
Thanks for your kindly answers, but I still feel loss understanding some concepts.
MethodBody
type nodes refer to nodes in the Control-Flow-Graph (CFG) which contain instructions representing the body of the method. As for your second question: there is no guarantee that a particular CIL instruction has a 1:1 mapping to a particular SIL instruction. In this case, there is an SIL instruction corresponding toldarg
becauseldarg
is the pushing of an argument value onto the program stack; in other words, this is the loading of a value of a location on the heap, or an SILLoad
instruction applied to the method argument. On the other hand,ldc
, or the pushing of a constant onto the stack, doesn't have a corresponding SIL representation because it has no association with locations on the heap.
Oh, I see. I also noticed the same description in wiki: "The nodes which contain the instructions associated with the body of the method body are mostly identified with the MethodBody type, though nodes containing invocations to other methods are identified as Call nodes."
So does MethodBody
mean "a statement in original source code"? Or can I say that MethodBody
represents one statement and the very next successors of MethodBody
node represent some components of this statement.
Well, we do use metadata information insofar as we do use complete method signatures (including method name, class name, method arguments) to identify translated methods. From a control flow representation perspective, CIL Metadata doesn't really have much relevance as it does not actually convey information about a program's control flow.
Sorry I didn't describe my question clearly. Metadata
mentioned here is not "C# Metadata", but a type of instruction of SIL.
Thanks for your kindly answers, but I still feel loss understanding some concepts.
MethodBody
type nodes refer to nodes in the Control-Flow-Graph (CFG) which contain instructions representing the body of the method. As for your second question: there is no guarantee that a particular CIL instruction has a 1:1 mapping to a particular SIL instruction. In this case, there is an SIL instruction corresponding toldarg
becauseldarg
is the pushing of an argument value onto the program stack; in other words, this is the loading of a value of a location on the heap, or an SILLoad
instruction applied to the method argument. On the other hand,ldc
, or the pushing of a constant onto the stack, doesn't have a corresponding SIL representation because it has no association with locations on the heap.Oh, I see. I also noticed the same description in wiki: "The nodes which contain the instructions associated with the body of the method body are mostly identified with the MethodBody type, though nodes containing invocations to other methods are identified as Call nodes."
So does
MethodBody
mean "a statement in original source code"? Or can I say thatMethodBody
represents one statement and the very next successors ofMethodBody
node represent some components of this statement.
A MethodBody
node stores SIL instructions which are translated from CIL instructions composing the original .NET method. The successor nodes of a MethodBody
node contain the succeeding instructions in the control flow -- they are not "components" of the prior instructions.
Well, we do use metadata information insofar as we do use complete method signatures (including method name, class name, method arguments) to identify translated methods. From a control flow representation perspective, CIL Metadata doesn't really have much relevance as it does not actually convey information about a program's control flow.
Sorry I didn't describe my question clearly.
Metadata
mentioned here is not "C# Metadata", but a type of instruction of SIL.
We haven't translated that because Infer# is focused only on representing the semantic control flow of the program; as the program documents, Metadata is not strictly needed for this purpose. I believe other frontends, like the Java one (see jTrans.ml), similarly does not generate Metadata
in its output.
We haven't translated that because Infer# is focused only on representing the semantic control flow of the program; as the program documents, Metadata is not strictly needed for this purpose. I believe other frontends, like the Java one (see jTrans.ml), similarly does not generate Metadata in its output.
Make sense, Thank you!
But I still have a quick question. If the succeeding nodes of MethodBody
represent succeeding instructions, I wonder if the number of MethodBody
should be "1" in each method, because there is only one "method body" of each method.
The above image is CFG of Examples/Examples/ResourceLeakInterproceduralOK
:
/// <summary>
/// Interprocedural resource usage example, no leaks expected.
/// </summary>
public void ResourceLeakInterproceduralOK(){
StreamWriter stream = AllocateStreamWriter();
if (stream == null)
return;
try
{
stream.WriteLine(12);
}
finally
{
stream.Close();
}
}
Actually, I want to figure out the usage of MethodBody
from the beginning, but I found MethodBody
seems not representing "the body of method".
We haven't translated that because Infer# is focused only on representing the semantic control flow of the program; as the program documents, Metadata is not strictly needed for this purpose. I believe other frontends, like the Java one (see jTrans.ml), similarly does not generate Metadata in its output.
Make sense, Thank you!
But I still have a quick question. If the succeeding nodes of
MethodBody
represent succeeding instructions, I wonder if the number ofMethodBody
should be "1" in each method, because there is only one "method body" of each method.The above image is CFG of
Examples/Examples/ResourceLeakInterproceduralOK
:/// <summary> /// Interprocedural resource usage example, no leaks expected. /// </summary> public void ResourceLeakInterproceduralOK(){ StreamWriter stream = AllocateStreamWriter(); if (stream == null) return; try { stream.WriteLine(12); } finally { stream.Close(); } }
Actually, I want to figure out the usage of
MethodBody
from the beginning, but I foundMethodBody
seems not representing "the body of method".
Ah, MethodBody
nodes store at least 1 instruction from the body of the method, but not necessarily the entire body of the method. Hence there can be greater than one MethodBody
node.
Ah, MethodBody nodes store at least 1 instruction from the body of the method, but not necessarily the entire body of the method. Hence there can be greater than one MethodBody node.
emm... if so, I want to know how you design for it. I can hardly find the relationship and summarize the rules between CIL instruction and the reason why InstructionParser
s of those emit MethodBody
.
Essentially, we simulate the CIL program stack (that's the program stack object in C#) as we iterate over each CIL instruction in succession. As we do so, we render the corresponding SIL instructions. Were there particular InstructionParsers/program translations in particular you were thinking about?
Essentially, we simulate the CIL program stack (that's the program stack object in C#) as we iterate over each CIL instruction in succession. As we do so, we render the corresponding SIL instructions. Were there particular InstructionParsers/program translations in particular you were thinking about?
Yes, I have read all C# files under Cilsil/Cil/Parsers
including InstructionParsers
. Though I haven't learnt .NET platform structure, I knonw the parsers extend InstructionParsers
and process expression simulate a stack (I notice there is stack.Pop
and Push
).
I guess Expression
is not the part I feel confused. After having a look at CFG output, I found there are several MethodBody
nodes and I just cannot figure out what MethodBody
means and "when and why" infersharp should register a new MethodBody
node into CFG.
I find several examples in infersharp
https://github.com/microsoft/infersharp/blob/main/Cilsil/Cil/Parsers/InitObjParser.cs#L89
var node = AddMethodBdyInstructionsToCfg(
state,
structInitializationInstructions.ToArray());
Obejct initialization seems a part of statement.
if (ParameterIsByReference(index, state.Method))
{
// Strips out the pointer because the type produced by CreateArg already
// carries it.
var argAddressType =
new Address(Tptr.PtrKind.Pk_pointer,
argType.StripPointer(),
argVar,
referenceKind: Address.ReferenceKind.Parameter);
state.PushExpr(argVar, argAddressType);
}
else
{
var loadArgument = state.PushAndLoad(argVar, argType);
node = AddMethodBodyInstructionsToCfg(state, loadArgument);
state.AppendToPreviousNode = true;
}
If parameters are not passed in as reference, LdargParser will create a MethodBody node. But what's the relationship between loading an argument and method body node.
https://github.com/microsoft/infersharp/blob/main/Cilsil/Cil/Parsers/NewarrParser.cs#L66
// Represents memory allocation.
var callInstr = new Call(returnId: arrayIdentifier,
returnType: arrayTypeWithPtr,
functionExpression: new ConstExpression(
ProcedureName.BuiltIn__new_array),
args: args,
flags: new Call.CallFlags(),
location: state.CurrentLocation);
var newNode = AddMethodBodyInstructionsToCfg(state, callInstr);
state.PushExpr(new VarExpression(arrayIdentifier), arrayTypeWithPtr);
state.PushInstruction(instruction.Next, newNode);
New an array seems a statement in C#, and once I create an array, I create a related MethodBody. This makes me really confused.
Some other examples like Stind
, Stloc
, Ret
, Ldelem
are the same for me.
In general, when the CIL corresponds to an SIL instruction, we create the corresponding SIL instruction which in turn gets added to a MethodBody node. Not all CIL have such a correspondence -- for example, ldc refers to the pushing of a constant onto the CIL program stack but has no direct corresponding operation in SIL because the constant doesn't come from a register. By contrast, ldarg refers to the pushing of an argument's value onto the CIL program stack; this corresponds to the load of the argument register. Similarly, Initobj/Newarr refer to the creation of objects, which correspond to Call instructions in the SIL, hence we create MethodBody nodes to store these instructions.
In general, when the CIL corresponds to an SIL instruction, we create the corresponding SIL instruction which in turn gets added to a MethodBody node. Not all CIL have such a correspondence -- for example, ldc refers to the pushing of a constant onto the CIL program stack but has no direct corresponding operation in SIL because the constant doesn't come from a register. By contrast, ldarg refers to the pushing of an argument's value onto the CIL program stack; this corresponds to the load of the argument register. Similarly, Initobj/Newarr refer to the creation of objects, which correspond to Call instructions in the SIL, hence we create MethodBody nodes to store these instructions.
Thanks for your patient answer! Make sense here.
Hello!
Thanks for this tool and the PR about language-agnostic layer.
This issue is not going to report some bugs and features. I just wanna get answers to my a few questions, which are mainly about infersharp implementation.
MethodBody
can be found everywhere in cfg.json? And what's the logic of it? Perhaps I fail to figure out whatMethodBody
means? (e.g Why LdargParser generatesMethodBody
node, while LdcParser doesn't.)Metadata
instruction useful? I wanna know the reason infersharp doesn't implement it.Many thanks!