joernio / joern

Open-source code analysis platform for C/C++/Java/Binary/Javascript/Python/Kotlin based on code property graphs. Discord https://discord.gg/vv4MH284Hc
https://joern.io/
Apache License 2.0
1.97k stars 267 forks source link

[Cpg] extra Cpg Node to represent the argument of function/method call #3907

Open d1tto opened 9 months ago

d1tto commented 9 months ago

Is your feature request related to a problem? Please describe.

The PHP language has properties related to the arguments of a function call, such as array unpacking: test(...$arg). This marks the argument $arg as unpack in the AST. There is no such property in the CPG. Other programming languages have similar properties, such as python, javascript

Describe the solution you'd like

Add an additional node in CPG to represent the function arguments. It can be just a wrapper of other nodes.

Describe alternatives you've considered

It also can be represented as a string in the code field of the function call node. For example: test(...$arg),in current CPG,the code filed of Call node is test($arg) instead of test(...$arg)

DavidBakerEffendi commented 9 months ago

I believe there is a VARIADIC modifier that can be added to this kind of node. Also, ARGUMENT edges should connect the AST children under a call that are meant to link up with the corresponding callee's parameter nodes.

d1tto commented 9 months ago

I believe there is a VARIADIC modifier that can be added to this kind of node.

Thanks. how to do this.. create argument with below code ?:

arg.withChildren(modifiers)
DavidBakerEffendi commented 9 months ago

Ah, I see this is on the corresponding parameter node. So call.argument.parameter.isVariadic. One would need a working call graph here, but I haven't investigated the call graph in the php frontend yet.

d1tto commented 9 months ago

Ah, I see this is on the corresponding parameter node. So call.argument.parameter.isVariadic. One would need a working call graph here, but I haven't investigated the call graph in the php frontend yet.

I think it is not correct to access the parameter corresponding to the argument position. take the following code for example

function test(...$arg) { ... }
$arg = [1, 2, 3];
test(...$arg);
test(1, 2, 3);

The unpack attribute is on the argument, not the parameter

DavidBakerEffendi commented 9 months ago

Yeah, I see the value in that. It may be valuable to have a similar modifier here to know when the unpacking happens at the call site.

d1tto commented 9 months ago

This probably requires the following changes

is right?

DavidBakerEffendi commented 9 months ago

It is unclear if we'd want this to be a property or a modifier, since it's a schema change it is an API change, so I would want to involve others in the conversation.

@maltek @max-leuthaeuser @ml86 @fabsx00

d1tto commented 9 months ago

It is unclear if we'd want this to be a property or a modifier, since it's a schema change it is an API change, so I would want to involve others in the conversation.

@maltek @max-leuthaeuser @ml86 @fabsx00

Thanks. I think it is necessary to support it for the more precise data flow analysis of joern itself and the secondary developers based on joern

d1tto commented 9 months ago

I think https://github.com/joernio/joern/pull/3908 is a tmp solution, which make the code field is more correct without change the Cpg schema and api. I can parse the code field to identify which argument is unpacked. Could you help me review it? Thanks.

maltek commented 9 months ago

Given that this is practically unsupported right now, I don't really have compatibility concerns here.

One thing to keep in mind for the schema changes: python doesn't just have the very similar foo(*list) but also the foo(**dict) construct. Not saying we must cover the ** variant in the spec now - just that if we can make things extendible to express that difference, then we should.


I don't think EXPRESSIONs can currently have modifiers - I guess we could change that, but just adding a new property to those seems simpler? http://cpg.joern.io/#node-ref-expression

d1tto commented 9 months ago

I don't think EXPRESSIONs can currently have modifiers - I guess we could change that, but just adding a new property to those seems simpler? http://cpg.joern.io/#node-ref-expression

I think it's reasonable to be a property of an EXPRESSION because in addition to the arguments having an unpack property, other EXPRESSIONs may also have one, such as array unpack in PHP: $x = [...$arg, 1, 2, ...$arg];. Other languages may also have this case. Similar in python:

list1 = [1, 2, 3]
list2 = [4, 5, 6]
list3 = [*list1, *list2]