Closed izlatkin closed 2 years ago
x
and _x
must be different program variables, since the actual name is used as a prefix. There are always two sets of state variables (here x
), two sets of function input parameter variables (here _x
) and one set of function output parameter variables (here none).
comments from @leonardoalt about the naming convention: the first number is a unique counter, then it's the node id then the contract node id
example: https://github.com/ethereum/solidity/blob/develop/libsolidity/formal/CHC.cpp#L1429 https://github.com/ethereum/solidity/blob/develop/libsolidity/formal/CHC.cpp#L1342
The question is about how to figure out which var is a field, and which is a function argument. I understand the difference between x
and _x
in the code, but I don't understand how to get it from CHCs.
Note that this is valid for the names of predicate names.
It's not valid for tx_*
, crypto_*
, state_*
etc
Variables (state or function input/output parameters) have the pattern name_id_uniqueCounter
.
The question is about how to figure out which var is a field, and which is a function argument. I understand the difference between
x
and_x
in the code, but I don't understand how to get it from CHCs.
You can extract the AST id of a variable and find that AST node in the AST output of the compiler.
Still unclear. How can we figure out the arity of function?
You can extract the AST id of a variable and find that AST node in the AST output of the compiler.
We only deal with CHCs at this point.
You can try to guess, but to be sure one needs to match the AST.
No, that's not going to work
It's not possible to infer the arity of the function just from the encoding since input variables come after state variables, so you don't know which is which.
This pattern would fix the issue: name_id_F_uniqueCounter
where F
would be 0
if the variable is a field, 1
if it is an input, and 2
if it is an output.
so you don't know which is which.
I figured, and that's exaclty what I was asking.
As far as inferring things go, there are also things like function overriding. Even if you know the name of a function that was called, you don't know from which contract just by looking at the encoding.
One step at a time. Can we assume there is no overriding for now?
And that there is only one contract...
If we are able to only discover all the necessary info from CHC encoding, we won't have blockers and will be able to deliver the first prototype by the end of the week. If we start integrating the compiler now, I've no idea about ETA.
The encoding already handles virtual calls and these things, so you don't have to worry about that on the encoding side. The only thing that may be ambiguous is when you report that a certain
The thing is about changing the encoding slightly (as I suggested), which will allow us to generate tests. Without that, we can only "guess" (as you suggested) but it will result in a pile of uncompilable tests.
So to be absolutely clear, the question is "Could you please implement the new naming convention for us"?
Ah sorry, the comment with your suggestion only showed now in this thread for me. Sure, that shouldn't be a problem.
As a workaround, I can extract # of input parameters and types (signature) It is already implemented
For example:
[[['Crypto', 'contract'], ['hash', 'bytes32']], [['C', 'contract'], ['f1', 'bytes32'], ['inv']]]
from
contract Crypto {
function hash(bytes32) external pure returns (bytes32) {
return bytes32(0);
}
}
contract C {
address owner;
bytes32 sig_1;
bytes32 sig_2;
Crypto d;
constructor() {
owner = msg.sender;
}
function f1(bytes32 _msg) public {
address prevOwner = owner;
sig_1 = d.hash(_msg);
sig_2 = d.hash(_msg);
assert(prevOwner == owner);
}
function inv() public view {
assert(sig_1 == sig_2);
}
}
Nice! How are you doing that?
I think this is just lexical/syntax analysis which might be imprecise.
Is there any fundamental difference in signatures:
function hash(bytes32) external pure returns
and
function f1(bytes32 _msg) public
The first function has input type, but does not name it.
I know you don't want to implement compiler integration just yet, which makes total sense, but let me share here what the AST thing looks like so you'll see it's quite simple:
contract C {
uint x;
function f(uint _x) public {
x = _x;
}
}
compiling the above with solc a.sol --ast-compact-json
gives
{
"absolutePath": "a.sol",
"exportedSymbols": {
"C": [
13
]
},
"id": 14,
"nodeType": "SourceUnit",
"nodes": [
{
"abstract": false,
"baseContracts": [],
"canonicalName": "C",
"contractDependencies": [],
"contractKind": "contract",
"fullyImplemented": true,
"id": 13,
"linearizedBaseContracts": [
13
],
"name": "C",
"nameLocation": "9:1:0",
"nodeType": "ContractDefinition",
"nodes": [
{
"constant": false,
"id": 2,
"mutability": "mutable",
"name": "x",
"nameLocation": "19:1:0",
"nodeType": "VariableDeclaration",
"scope": 13,
"src": "14:6:0",
"stateVariable": true,
"storageLocation": "default",
"typeDescriptions": {
"typeIdentifier": "t_uint256",
"typeString": "uint256"
},
"typeName": {
"id": 1,
"name": "uint",
"nodeType": "ElementaryTypeName",
"src": "14:4:0",
"typeDescriptions": {
"typeIdentifier": "t_uint256",
"typeString": "uint256"
}
},
"visibility": "internal"
},
{
"body": {
"id": 11,
"nodeType": "Block",
"src": "51:14:0",
"statements": [
{
"expression": {
"id": 9,
"isConstant": false,
"isLValue": false,
"isPure": false,
"lValueRequested": false,
"leftHandSide": {
"id": 7,
"name": "x",
"nodeType": "Identifier",
"overloadedDeclarations": [],
"referencedDeclaration": 2,
"src": "55:1:0",
"typeDescriptions": {
"typeIdentifier": "t_uint256",
"typeString": "uint256"
}
},
"nodeType": "Assignment",
"operator": "=",
"rightHandSide": {
"id": 8,
"name": "_x",
"nodeType": "Identifier",
"overloadedDeclarations": [],
"referencedDeclaration": 4,
"src": "59:2:0",
"typeDescriptions": {
"typeIdentifier": "t_uint256",
"typeString": "uint256"
}
},
"src": "55:6:0",
"typeDescriptions": {
"typeIdentifier": "t_uint256",
"typeString": "uint256"
}
},
"id": 10,
"nodeType": "ExpressionStatement",
"src": "55:6:0"
}
]
},
"functionSelector": "b3de648b",
"id": 12,
"implemented": true,
"kind": "function",
"modifiers": [],
"name": "f",
"nameLocation": "33:1:0",
"nodeType": "FunctionDefinition",
"parameters": {
"id": 5,
"nodeType": "ParameterList",
"parameters": [
{
"constant": false,
"id": 4,
"mutability": "mutable",
"name": "_x",
"nameLocation": "40:2:0",
"nodeType": "VariableDeclaration",
"scope": 12,
"src": "35:7:0",
"stateVariable": false,
"storageLocation": "default",
"typeDescriptions": {
"typeIdentifier": "t_uint256",
"typeString": "uint256"
},
"typeName": {
"id": 3,
"name": "uint",
"nodeType": "ElementaryTypeName",
"src": "35:4:0",
"typeDescriptions": {
"typeIdentifier": "t_uint256",
"typeString": "uint256"
}
},
"visibility": "internal"
}
],
"src": "34:9:0"
},
"returnParameters": {
"id": 6,
"nodeType": "ParameterList",
"parameters": [],
"src": "51:0:0"
},
"scope": 13,
"src": "24:41:0",
"stateMutability": "nonpayable",
"virtual": false,
"visibility": "public"
}
],
"scope": 14,
"src": "0:67:0",
"usedErrors": []
}
],
"src": "0:68:0"
}
In this JSON you have direct knowledge about a contract's state variables and a function's input/output parameters.
A very straightforward approach for now and, as mentioned, might be imprecise for some cases just couple methods https://github.com/izlatkin/solidity_testgen/blob/main/scripts/SolidityTestGen.py#L188 https://github.com/izlatkin/solidity_testgen/blob/main/scripts/SolidityTestGen.py#L221
Is there any fundamental difference in signatures
For the encoding the differences will be in the name and return types, but not naming a parameter shouldn't influence the predicate's name.
A very straightforward approach for now and, as mentioned, might be imprecise for some cases just couple methods https://github.com/izlatkin/solidity_testgen/blob/main/scripts/SolidityTestGen.py#L188 https://github.com/izlatkin/solidity_testgen/blob/main/scripts/SolidityTestGen.py#L221
Right. If you already have the source code and solc
, I think you could detect the things you need without much more code using the python JSON lib.
A very straightforward approach for now and, as mentioned, might be imprecise for some cases just couple methods https://github.com/izlatkin/solidity_testgen/blob/main/scripts/SolidityTestGen.py#L188 https://github.com/izlatkin/solidity_testgen/blob/main/scripts/SolidityTestGen.py#L221
Right. If you already have the source code and
solc
, I think you could detect the things you need without much more code using the python JSON lib.
right, I think we can parse json now
Is there any fundamental difference in signatures
For the encoding the differences will be in the name and return types, but not naming a parameter shouldn't influence the predicate's name.
Don't get it, still. What is the SMT name of the argument of hash
function?
should be "_uniqueCounter_ssaCounter" as far as I remember
(argh GH formatting is destroying my msg)
right, I think we can parse json now
I don't see how it helps us now. All our CHC-related code is in C++ now. We get an SMT model directly via Z3 API. We'd need to write code to serialize it and parse again from python.
I don't know what the exact workflow is, I just suggested it as the precise replacement for the python code that @izlatkin sent.
How is that code used currently in the workflow?
I totally agree that @izlatkin should not wast time writing his own parser since it's already available
How is that code used currently in the workflow?
@izlatkin
I think our CHC-based CEX generator and the python-frontend are disconnected, still
In order to connect, we need a correct mapping from an SMT model to a sequence of function calls in Solidity
And for that, we need to know arities, etc..
Ideally, we need this info together in the same module where we deal with SMT, that is in the C++ module.
Alternatives are possible, but will take longer to develop
I will make a branch with the naming changes and send it to you tomorrow.
In order to connect, we need a correct mapping from an SMT model to a sequence of function calls in Solidity
This will require a few more naming changes, but should also be possible.
P.S. I think it's kind of onanism to chat in the comments of github. Should move to Discord;)
Can you please re-invite me to your server?
Can you please re-invite me to your server?
just sent you an email with the invite
The brief workflow is currently following (executes by Python scripts):
json
from solc a.sol --ast-compact-json
)comments: here's a new soljson.js that has the types of the variables in their SMT names: https://output.circle-artifacts.com/output/job/72b84605-390a-4d0b-9a9b-202170bb7033/artifacts/0/soljson.js varName_nodeId_varType_SSAIndex is the new format where varType is: 0: state variable 1: function input var 2: function output var 3: function local var
For instance, this encodes a call to function set:
(summary_6_function_set__110_111_0 error_0 this_0 crypto_0_ecrecover_21 crypto_0_keccak256_22 crypto_0_ripemd160_23 crypto_0_sha256_24 tx_0_block.basefee_3 tx_0_block.chainid_4 tx_0_block.coinbase_5 tx_0_block.difficulty_6 tx_0_block.gaslimit_7 tx_0_block.number_8 tx_0_block.timestamp_9 tx_0_blockhash_10 tx_0_msg.data_11_bytes_tuple_accessor_array_1 tx_0_msg.data_11_bytes_tuple_accessor_length_2 tx_0_msg.sender_12 tx_0_msg.sig_13 tx_0_msg.value_14 tx_0_tx.gasprice_15 tx_0_tx.origin_16 state_0_balances_0 x_2_0 _x_102_0 state_1_balances_0 x_2_1 _x_102_1)
fun:
The contract has one field
x
, and set has one argument_x
. I see them encoded as two last arguments ofsummary_6_function_set__110_111_0
:x_2_1
and_x_102_1
But we need a rule to determine, based on the encoding only, how many arguments represent inputs. Do all of them by convention have names that begin with an underscore (silly question)?
the ADT encoding also has this:
(summary_6_function_set__110_111_0 error_0 this_0 abi_0 crypto_0 tx_0 state_0 x_2_0 _x_102_0 state_1 x_2_1 _x_102_1)
The last two (
x_2_1
,_x_102_1
) are the the field and argument, resp.