Open AnnaShaleva opened 3 months ago
I looked at this closer and it seems like SeqPoints can overlap sometimes (usually one can cover entire function), is this intended?
This problem is more complicated than it seems to be.
is this intended?
Yes, it is intended, because our compiler generates DebugInfo according to NEP-19 standard. And here what the standard tells about debug sequence points:
sequence-points: a collection of strings that encode a map of NeoVM bytecode addresses back to source code locations. ... start-line: This is the line in the source code file that is associated with the sequence point ... end-line: for languages that support multi-line code expressions, this is the last line of the source code associated with the sequence point. For single-line expressions, this will be the same as start-line
Hence, given some [non-builtin and non-inlined] function, a function will always have a separate sequence point corresponding to the whole function body (like PutNamed function of Container contract). Moreover, functions bodies are not the only place that may contain overlapping sequence points. For example, if a single line contain multiple wrapped calls for those sequence points are emitted, then several wrapped sequence points will be generated, e.g. lines like this (taken from https://github.com/nspcc-dev/neofs-contract/blob/6a5813a25d546e495d204b3c964b956a221733f6/contracts/container/contract.go#L784):
string(id[:len(estimateKeyPrefix)]) != estimateKeyPrefix {
will have two nested sequence point (which is correct and expected):
"4556[38]617:3-617:38","4558[38]617:14-617:36"
So basically, for the DebugInfo itself it's quite native that a sequence point boundaries may include several other sequence points. But the situation is changed when we come to neotest
coverage. For coverage blocks overlapping is not acceptable. Here what Go blog tells about it:
What the coverage annotation does is instrument blocks, which are typically bounded by brace brackets. Getting this right in general is very hard. A consequence of the algorithm used is that the closing brace looks like it belongs to the block it closes, while the opening brace looks like it belongs outside the block. A more interesting consequence is that in an expression like
f() && g()
there is no attempt to separately instrument the calls to f and g, Regardless of the facts it will always look like they both ran the same number of times, the number of times f ran.
To be fair, even gcov has trouble here. That tool gets the instrumentation right but the presentation is line-based and can therefore miss some nuances.
So we definitely need some additional modifications over sequence points in order to emit proper coverage profile. #3575 attempts to fix this issue, but I consider we can't just remove all outer sequence points from the set of nested sequence points, because it will work for the case of function definitions but won't work properly for the case of nested calls like string(id[:len(estimateKeyPrefix)]) != estimateKeyPrefix {
. If we remove an outer point from this statement, then it will be treated as "uncovered" which is not the case since the number of calls to outer instruction equals to the number of calls to inner instruction.
Also, another problem is that currently neotest coverage doesn't calculate the number of statements included into coverage block properly. It uses the number of lines instead of the number of instructions which is not the worse solution, but it's still inaccurate: https://github.com/nspcc-dev/neo-go/blob/a9242535db757faba860e21c194033f30c679f48/pkg/neotest/coverage.go#L163
These two problems are closely connected, so we definitely need some "smart" merging algorithm that will convert the set of sequence points to the proper coverage blocks. It may remove function declaration points, but it should also merge one-line instructions into a single coverage block with proper calculation of the number of statements (probably by squashing all instructions that are located in one line). That's why I can't fully accept the solution presented in #3575.
But at the same time it's not that easy to filter out unnecessary instructions because we don't have any code structure information other than sequence point boundaries displayed in debug info.
@roman-khimov, what do you think wrt this problem?
I guess we would have to actually use AST and maybe keep extra info from codegen
I thought about it, but ideally we need to generate coverage using only debug info specification, and debug info itself is strictly fixed by the standard.
I think we can improve without changing NEP-19. The format itself is rather rich. We can build a map of instruction indexes to a list of sequence points. We can build a map of code lines to a list of sequence points and/or to a list of instruction indexes. We know function boundaries. But we need more of smaller examples to see what behavior is more appropriate for what case. At least I'd try that first.
We know function boundaries.
The problem is that right now in general case we don't. For package-level function declarations we can exclude some instructions from coverage calculations based on the range
property of methods specified in DebugInfo (we should try it, BTW). However, for anonymous function declarations we don't have such information as a function boundaries. Hence, for function like this there will be generated a sequence point that covers the whole function:
var blaF = func() { // <- sequence point starts here
runtime.Log("bla")
// any set of instructions goes here
} // <- sequence point ends here
for function like this
Shouldn't it be a separate https://github.com/neo-project/proposals/blob/master/nep-19.mediawiki#user-content-Method? Dynamic name, of course, but from https://github.com/neo-project/proposals/blob/master/nep-19.mediawiki#methods:
Both private and public methods from the contract should be represented in the methods array.
And these are more rare anyway, while regular functions can be improved already.
Shouldn't it be a separate https://github.com/neo-project/proposals/blob/master/nep-19.mediawiki#user-content-Method?
It's a contravercial question, because it's not a fully-qualified private or public method of the contract; it's an anonymous function. And our debug info does not include anonymous functions to the methods list. Is it a bug? I'd say no, but you may argue.
I will. It's some code block for sure and it can be moved into a section of its own. Also IIRC we're not capturing state properly right now for these. You've probably seen some func_name.1.1
(or something like that) in panics and Go profiler, that's exactly that --- these code blocks are identified as separate functions.
You've probably seen some func_name.1.1
A good example, ref. https://github.com/nspcc-dev/neo-go/issues/3619.
RET
-related problem will be fixed by https://github.com/nspcc-dev/neo-go/issues/3622. The remaining problems with smaller overlapping blocks should be fixed on the coverage side, but this problem is not that critical and may be solved by coverage blocks merge.
All in all, the remaining tasks are:
string(id[:len(estimateKeyPrefix)]) != estimateKeyPrefix {
), may be we can port their solution.
Current Behavior
The original issue is posted in https://github.com/nspcc-dev/neo-go/pull/3462#issuecomment-2291269916
Expected Behavior
Block coverage matches the expectations