sukyoung / safe

Scalable Analysis Framework for ECMAScript
Other
118 stars 37 forks source link

Question about Cross-files invocation and analyze for functions in a certain NodeJS module #41

Open YichaoXu opened 2 years ago

YichaoXu commented 2 years ago

Dear developers of SAFE, 

I notice that safe accepts multiple JS files as inputs, so I was wondering whether the SAFE is able to handle the model and some functions in NodeJS like module.exports and requires("...") to invoke some functions in different files. 

Besides, I am confused about how the CallInstructions are handled in SAFE. I noticed there are some methods like sematic.CI and semantics.getCallInfo. Those two functions require TracePartition as a parameter, but the output of sematic.getState(callBlock) always be Nil. I was wondering whether there are any development documents or examples of their usage.

Many thanks, 

jhnaldo commented 2 years ago

Unfortunately, SAFE does not support module systems, such as require of CommonJS (CJS) used in Node.js, but it only supports multiple JavaScript files as a single file by merging them sequentially.

We are sorry for the insufficient documentation of SAFE. TracePartition denotes an analysis sensitivity for more precise analysis with more fine-grained control points. For example, you can configure loop sensitivity, k-callsite sensitivity, or object sensitivity. Semantics.getCallInfo is just a getter of ccpToCallInfo, which is a mapping from a pair of a Call node and a TracePartition into a CallInfo, which consists of 1) the abstract state of the entry point of the callee function, 2) this value, and 3) arguments. The Semantics.CI function is a helper function to construct CallInfo.

If a call block does not have any abstract states in the current semantics, it means that this call block is unreachable. It should have at least one abstract state with different trace partitions if it is reachable.

Thanks!

YichaoXu commented 2 years ago

Thank you a lot for that information, which is really helpful.

I still have one question about the analysis of the functions. I was wondering whether only the invoked functions will be analyzed in SAFE. For example, there may be a lot of exported functions in a certain package like aws-lambda, and those functions are not invoked in the codes anywhere. I was thinking about whether the SAFE will not analyze their codes to find possible bugs in those functions.

I tested the following code but the SAFE did not respond to any bug information. I was wondering whether there are any possible configurations to enforce SAFE go into those uninvoked functions or if it is not implemented in SAFE.


function test(a){
    if (true) return a;
    else return "something else";
}
module.exports = {test}

Many thanks

YichaoXu commented 2 years ago

I am also curious about how the call block works in ControlFlowGraph. I read the codes of a forked version. I realized that the semantic class supports the functions to find out the extract callee from the call instructions.

It seems like there is not a precise "link" from the call block to the callee. From my understanding, the CallBlock only contains the CallInstruction, with call expression. There are no references implying the callee CFGFunction object.

I was wondering whether it is possible to directly find the callee from the ControlFlowDiagram without any instances of the semantic class.

jhnaldo commented 2 years ago

I was wondering whether there are any possible configurations to enforce SAFE go into those uninvoked functions or if it is not implemented in SAFE.

As you know, JavaScript is a highly-dynamic programming language. Thus, it is not easy to analyze a JavaScript function without any knowledge of the arguments. The number of arguments might vary. Different types of arguments might be accepted. Functions and even proxied objects with complex handlers might be given to a simple function. How about the analysis of this simple function:

function f(a, b) { return a + b; }

What's the expected analysis result of this function? A numeric addition? A string concatenation? A function call to other user functions? Is it possible to throw an error? If so, when? So, I think it is almost impossible to precisely cover all the possible scenarios without any knowledge of the arguments. If the given program is written in TypeScript, it becomes a much easier problem by using given argument types. But, for JavaScript, it is difficult to analyze functions without the information of arguments.

I am also curious about how the call block works in ControlFlowGraph.

In JavaScript, functions are values. Consider the following example:

...
function f(g) { g(); }
...

In the body of the function f, there exists a function call g(). But, it is impossible to know which functions are actually invoked by this function call without analysis of the entire code. Therefore, if we want to get the "link" from CallBlock nodes to callee functions, we need analysis result (semantic).