Closed ghost closed 5 years ago
Good issues.
Based on the above, I'd say the last time detected look-up semantics are to consult the following properties in order (as available), result.lastTimeDetectedUtc, invocation.startTimeUtc, run.startTimeUtc.
btw - just a side comment, there's not a lot of urgency around the time a detection is generated. this information isn't generally used to sort results in a particular order. neither is this data used to drive critical decisions by comparing two distinct log files. the typical use is to get a rough approximation of how long a problem has existed in a code base without being acted upon. a second case is mentioned above, just wondering when the most recent scan took place.
having said this, the fact that we allow for per-result time-stamps would allow for ordering issues by time of detection, etc.
allowing per-result time-stamps would also potentially help with incremental runs
@michaelcfanning I had assumed that we wanted to know when a check failed. Apparently you (also?) want to know when the check was evaluated.
"When did this go wrong, and when was it fixed?" are useful questions. We shouldn't kick them out of the provenance object in favor of the questions "When did you start checking for this, and when was the last time you checked?" Instead, I'd add resultProvenance
properties {first,last}EvaluationTimeUtc
and keep the existing {first,last}DetectionTimeUtc
.
@michaelcfanning There is no property run.startTimeUtc
. Maybe there should be.
This is a case where through a sequence of incremental changes, we lost something valuable. At first, we did have run.startTime
. Then we invented the invocation
object, and if you wanted to know when the run started, you'd ask for run.invocation.startTime
. Ahh, but then we made run.invocations
an array, and now all of a sudden we no longer had an obvious way to get the run start time.
The spec doesn't say that the invocation
objects are in temporal order of their start times. Maybe it should. It doesn't say that every tool invocation has to appear in run.invocations
. Maybe it should. It doesn't explain what to do if the run as a whole is kicked off by some sort of orchestration tool, which in turn spawns a sequence of other tools (perhaps in parallel!).
Maybe the simplest thing to do is to bring back run.startTimeUtc
. If we do, then because of the complexities I mentioned above:
run.startTimeUtc
to match the startTimeUtc
property of any invocation
object. The log producer, whoever that is, is responsible for providing its notion of the run's start time.invocation.startTimeUtc
should only participate in the default calculation for resultProvenance.lastDetectionTimeUtc
if resultProvenance.invocationIndex
is present to tell you which invocation produced the result.Subject to that, I agree with your proposed ordering:
resultProvenance.lastDetectionTimeUtc
invocation.startTimeUtc
(for the invocation
object, if any, specified by resultProvenance.invocationIndex
)run.startTimeUtc
@katrinaoneil Could you remind me what you mean by an "incremental run"? Are you talking about a scenario where I analyze half of the code base on Monday, and the other half on Tuesday, and I want to produce a SARIF file containing a single run
object that includes all those results? Or are you talking about producing a SARIF file containing only those results that are new since the last time you ran the tool? Or something else entirely?
For compliance and other reasons, we want to track most recent evaluation of a check. I see the subtlety here, though, what happens when you fix an issue in code? Does it convert to 'pass', in which case you lost information on when it was fixed? It should not, in my opinion. Instead, it should be left as failed (i.e., error, warning or not) and marked as absent. The 'pass' designation is relegated to the following scenarios, either it is applied at some general target scope (indicating, for example, 'this function appears to have no memory corruption issues at this point in time), or in cases when an issue could have been introduced but wasn't (e.g., 'i detected a new use of memcpy and good news everyone, it doesn't appear to corrupt memory). all the above are subtle distinctions but important. if an issue ever existed in code but was fixed, you want to know that (so, mark it failed but absent and leave it that way). if a pattern was inserted in code but was always correct, you'd like to know that too (it was introduced as 'pass' so maintain it). From this perspective 'firstTimeDetectedUtc' and 'lastTimeDetectedUtc' inherently relate to the pass vs. fail distinction. I think you'd reset only in cases where a 'pass' value regresses to 'fail'.
This appears to be a case where accommodating the complex, atypical case has compromised the common scenario. Ok. Bringing back run.startTimeUtc could be ok. We could also simply note that in the case where there is a single invocation only, it is fine to assume that every result is associated with that invocation. If you have multiple invocations, you need to specify the index.
@lgolding I think it's more of the former rather than the latter. I am talking about a scan that analyzes only part of the code base (e.g. those pieces of the code base that were changed since the last scan).
Ok, I see my mistake. You're right, the way a result management system tells you that an issue no longer appears is not by setting "level": "pass"
; it's by setting "baselineState": "absent"
. And the result management system should set the resultProvenance
"last
" properties to the most recent detection that was not marked "baselineState": "absent"
.
So we agree: when I write the change draft for this, I will not mention "pass
" as an exclusion in the defaulting logic.
We can make the same argument for "level": "notApplicable"
. Last week I ran my 64-bit checker against a 32-bit binary and got "notApplicable"
; this week I ran it again and got the same answer. The build engineer will want to know that they've been running this useless check for the last six months, and also to know exactly what tool invocation allowed it to creep in.
So we don't mention an exclusion for "notApplicable"
in the exclusion logic, either.
I'm willing to accept this, but I'll point out that, just as I did in #285 (where we discussed the default for resultProvenance.invocationIndex
) that I'd rather avoid complex defaulting logic to reduce the burden on implementers (and for the matter, to reduce the conceptual load on human users of the format).
Putting this all together, and accepting your suggestion for the single-invocation case, we have this logic:
If resultProvenance.lastDetectionTimeUtc
is absent, then:
run.invocations
is present and has a single element, and if run.invocations[0].startTimeUtc
is present, then that is the default.run.invocations
is present, and resultProvenance.invocationIndex
is present, and run.invocations[resultProvenance.invocationIndex].startTimeUtc
is present, then that is the default.run.startTimeUtc
is present, then that that is the default.Oops. Accidentally closed.
So, we have removed run.startTimeUtc and this change would require that we introduce it yet again. I wonder if we can close on a proposal that doesn't require it? All that's really required here is for the consumer to pick up a startTimeUtc value from an appropriate invocations object. It doesn't seem too complicated to go examine that data and look for the earliest start time.
As we've discussed, we could also require the invocations array to be sorted by earliest invocation first, but I'm not sure this is required.
In general, I think we should keep our eyes on a new principle: SARIF should accommodate complex scenarios while also keeping, as far as possible, the simple scenario simple.
If result.lastDetectionTimeUtc
is absent, then:
run.invocations
is present, then the earliest invocation.startTimeUtc
that is present, if any, is the default.?
@michaelcfanning and I discussed this offline. I reminded him that the earliest invocation.startTimeUtc
does not necessarily equal the run's start time (for reasons I explained in an earlier comment). He replied by reminding me of the driving scenario, which is:
... to identify the time of detection for a result. It is helpful for this value to be consistent, i.e., there should be a single timestamp for an analysis run. From this angle: find the earliest timestamp from the array of invocations seems fine as a simple, reliable way to get what’s required. I understand conceptually that doing this might not precisely describe the actual start of a run, but do we care? The real point here is: ‘this results has been raised since build XXX on date YYY’. We’re providing a slot for a date as a convenience, so that users aren’t required to go look up information associated with the instance id.
(Emphasis added)
I find that convincing, and I'm going to revise the change draft accordingly. We don't need to restore run.startTimeUtc
(or run.endTimeUtc
, which I had restored just for symmetry).
In TC #27, we agreed to amend the change draft for #272 (“Introduce resultProvenance object”) by adding
lastDetectionTimeUtc
, and defaulting it to the start time of the current run. When I started writing, a couple of complications emerged:First, we shouldn’t consider a result in the current run to constitute a “detection” unless the problem really was detected in this run. We didn’t consider that during TC #27.
Second, the
run
object doesn’t have astartTimeUtc
property! That property is on theinvocation
object, and arun
can have an array ofinvocation
s. The SARIF consumer won’t know whichinvocation
’s start time to use for the default.There are a few possibilities:
@kupsch has actually suggested (offline) that we add a mechanism to relate a result to an invocation, although I hadn’t yet filed an issue for it. I’ve now filed #285, “Provide a mechanism to associate a result with an invocation”.
When I first saw that suggestion, I imagined defining
result.invocationIndex
. But now that we haveresultProvenance
, IMO that’s the most natural place for it. So we could addresultProvenance.invocationIndex
, and use thestartTimeUtc
of the specified invocation object as the default forlastDetectionTimeUtc
.We could specify that the default for
lastDetectionTimeUtc
is the minimum of thestartTimeUtc
values over all theinvocation
objects inrun.invocations
. That would effectively be the run’s start time.We could decide that the default logic for
lastDetectionTimeUtc
is too complicated, and not specify a default. If you want it, you have to populate it.Other ideas?
Note that even if we chose Option 2 or Option 3,
resultProvenance.invocationIndex
would still be useful. We just wouldn’t use it to calculate the default forlastDetectionTimeUtc
.