Open ShiningMassXAcc opened 1 year ago
As part of the discussion I wanted to make sure some of my current understanding of guid vs correlation guids vs fingerprints vs partial fingerprints vs work items was correct. Does the SARIF spec assume the following is how sarif data is used/transformed?
This is my current understanding of the "flow" of data Sarif assumes takes place but is this correct? Does this mean that the fingerprint logically identifies the issue at hand? My understanding of the different fields is that:
What happens in the case that the result management system does not work with fingerprints but instead works with correlation guids? The spec sheet says
Other result management systems group results into equivalence classes without associating a computed fingerprint with each result, and they denote each equivalence class with an arbitrary unique identifier. This identifier is opaque: it is not calculated from information stored in the result, and hence contains no readable information about the result.
so does this mean that the correlation guids are used by systems that bucket items based on other factors outside of the sarif result? Was wondering if there was a concrete example of this since I'm not sure what an "opaque" identifier would be based on.
All of these items are noted to be potentially used by results management systems, but these are organized flatly in the
result
object. For customer consumption in the end, we expectworkItemUris
to be most used, but there is no clear indication of how aworkItemUri
maps to a result management system. The fact thatworkItemUris
is a list butguid
andcorrelationGuid
are not, indicates that perhaps we didn't have clarity on how we thought these would be used?In particular, how do we imagine multiple work items to represent? Are these from different results systems, different sub-results within a single result, different hashing systems? My team has complex results that have multiple facets to this, but it's unclear how best to use this system to date given how clients will use these.
Some possible thoughts:
tool
unique identifiers fromresults system
identifiers. The below snippet from the description onguid
clearly states this could be used by SARIF producers (the tool itself) or results management systems but only when the tool doesn't use it. By stating that a result management system SHOULD set this property ... what do they do when the tool produces this instead -> then they use fingerprints? This then muddies consistency of what results management systems will use when ingesting different types of results.While I'm opening this for general discussion, at minimum, I'd like to see
workItemUris
have a more direct mapping to the unique identifier that theworkItemUri
is being tracked against. In particular, the appendix on fingerprints perhaps muddies this further.I don't have a great implementation I like here, but I'd perhaps break all these items into a subsection that is more clearly delineated. This is perhaps way too much change - but I wanted to get a sense of how other folks consume these properties.
Note - I'm not beholden to this being included in 2.2, but using that for consistent titling for now.