microsoft / sarif-sdk

.NET code and supporting files for working with the 'Static Analysis Results Interchange Format' (SARIF, see https://github.com/oasis-tcs/sarif-spec)
Other
194 stars 93 forks source link

Implement a Heimdall HDF < -> SARIF converter #2286

Open michaelcfanning opened 3 years ago

michaelcfanning commented 3 years ago

We should investigate a converter both to send SARIF to HDF and to drop HDF back down to SARIF.

https://heimdall-lite.mitre.org/ https://saf.cms.gov/#/ https://www.youtube.com/watch?v=Vgr5wR1SFuA

michaelcfanning commented 3 years ago

@eddynaka @yongyan-gh

josepalafox commented 3 years ago

These may also be useful links that were shared:

https://github.com/mitre/inspecjs/tree/master/schemas

is their schema.

Their suggestion is to write the mapping into this action:

https://github.com/mitre/inspec_tools_action

aaronlippold commented 3 years ago

I would say your first targets for HDF -> SARIF should be limited to the current heimdall-tools static analysis tools

michaelcfanning commented 3 years ago

My first question for everyone, which direction serves everyone's interests best? HDF -> SARIF? or SARIF -> HDF? Both?

On the HDF repo, I note they reference a small eco-system of converters from existing tool formats, including at least one (Fortify) for which we have some SARIF support. A SARIF -> HDF converter contributed here could help build out their eco-system considerably, so that looks very useful. That is, any tool which produces SARIF directly or for which we have a native format to SARIF converter could in theory be transformed and sent to the Heimdall viewer/comparison tools and other tech.

This repo doesn't cover the direct HDF producers, which includes InSpec and who else? I think your comment above is to note that we should focus on converting these first? Is the idea that by implementing a HDF -> SARIF conversion we will accelerate getting these into GHAS?

aaronlippold commented 3 years ago

SARIF->HDF

aaronlippold commented 3 years ago

We get all the value from the tools that support SARIF and we add the value of aligning to 800-53 controls which makes all those tools even more valuable to all government customers, and ... we have a nice pretty viewer already made for it :)

aaronlippold commented 3 years ago

Actually, we suggest adding a SARIF converter into heimdall-tools.mitre.org which that action consumes

aaronlippold commented 3 years ago

https://github.com/mitre/heimdall_tools/blob/master/lib/heimdall_tools/ provides many examples of our source -> hdf pattern.

josepalafox commented 3 years ago

My first question for everyone, which direction serves everyone's interests best? HDF -> SARIF? or SARIF -> HDF? Both?

My understanding is our first target is SARIF -> HDF. This unblocks the opportunity we're focused on as it sends GHAS security data to the HDF data visualization tool.

On the HDF repo, I note they reference a small eco-system of converters from existing tool formats, including at least one (Fortify) for which we have some SARIF support. A SARIF -> HDF converter contributed here could help build out their eco-system considerably, so that looks very useful. That is, any tool which produces SARIF directly or for which we have a native format to SARIF converter could in theory be transformed and sent to the Heimdall viewer/comparison tools and other tech.

This repo doesn't cover the direct HDF producers, which includes InSpec and who else? I think your comment above is to note that we should focus on converting these first? Is the idea that by implementing a HDF -> SARIF conversion we will accelerate getting these into GHAS?

These two links were provided because we asked for a reference to the HDF schema to understand what it looks like and what we may be missing in SARIF. If you have another source for this, all good.

The they mentioned they had a Gem for https://github.com/mitre/inspec_tools that already has a GH action. The explanation of the tool is that it just munges data and that could be an appropriate place to add SARIF -> HDF.

aaronlippold commented 3 years ago

Once we can go that way, we will have a much better picture of what makes sense going HDF -> SARIF

michaelcfanning commented 3 years ago

I definitely like your Heimdall viewer and profile differ! Great stuff. The SARIF value, of course, is directed more towards the developer experience. i.e., Visual Studio, and VS Code and the GHAS UX are pretty nice viewers, too. :) The SARIF format emphasis here (not sure the degree to which HDF allows this) is to transport additional context, like code snippets, complete source files, or references to enlistment/branch details to seamlessly allow developers to definitively diagnose and then 'jump into' a remediation experience (i.e., actually start coding a fix). The HDF experience seems more strongly aligned around reporting out, i.e., conformance to compliance standards, progress since last profile, etc. And so, it does look like SARIF -> HDF is the right direction, as we can drop the supporting diagnostics related to individual issues and get to the bucketing/filtering/etc. in the HDF visualizer.

Please correct any of my worldview above that requires fixing. :)

In case it isn't clear from my replies so far, I've got the link to the JSON schema, thanks for that. I also took a look at your converter repo and it answers an open question of mine: how to map a native tool's rule ids to CWE/HDF-compliant controls. It looks like you maintain these mappings in the repo as CSV consumed at conversion time.

One interesting thing about SARIF is that it supports expressing those mappings as SARIF files (which only hold this kind of classification/mapping data, which SARIF calls 'taxonomies'). These mappings can also be referenced within a SARIF log indirectly via a URL (reflecting the fact that this data might be maintained by someone other than the tool provider). So, a SARIF file could reference a remote description of CWEs, or your NIST codes or any other organizational schema and then decorate its own rules with statements like 'this rule id maps directly to XXX'. GrammaTech contributed this feature based on its prior work and so it has some sophistication to it. E.g., you can say, 'this rule of my tool is a superset of this other taxonomy's rule XXX' or simply note that two things relate. All of this is to say that what you are capturing in CSV here might be useful to formalize in a subset of SARIF JSON that is published on the web. The spec examples are mostly oriented around Mitre's CWE schema.

This feature allows direct SARIF producers to emit relevant mappings such that we could perform the SARIF->HDF conversion strictly from the data in the SARIF log. Without it, someone would need to maintain this mapping externally, as you appear to do today, in whatever format is appropriate (CSV or web-hosted SARIF).

This is likely getting into more detail on a single topic than serves this thread, though. :) Maybe I can take a day or two to continue to explore and if I could get on a call with an appropriate audience, we can plan a path forward. ?

aaronlippold commented 3 years ago

I definitely like your Heimdall viewer and profile differ! Great stuff. The SARIF value, of course, is directed more towards the developer experience. i.e., Visual Studio, and VS Code and the GHAS UX are pretty nice viewers, too. :) The SARIF format emphasis here (not sure the degree to which HDF allows this) is to transport additional context, like code snippets, complete source files, or references to enlistment/branch details to seamlessly allow developers to definitively diagnose and then 'jump into' a remediation experience (i.e., actually start coding a fix). The HDF experience seems more strongly aligned around reporting out, i.e., conformance to compliance standards, progress since last profile, etc. And so, it does look like SARIF -> HDF is the right direction, as we can drop the supporting diagnostics related to individual issues and get to the bucketing/filtering/etc. in the HDF visualizer.

At least for the first ittera

Please correct any of my worldview above that requires fixing. :)

In case it isn't clear from my replies so far, I've got the link to the JSON schema, thanks for that. I also took a look at your converter repo and it answers an open question of mine: how to map a native tool's rule ids to CWE/HDF-compliant controls. It looks like you maintain these mappings in the repo as CSV consumed at conversion time.

Right, so for the first itteration we go SARIF -> HDF which allows us to get a hanndle on an agreed mapping of 800-53 controls to known issues ( CWE, etc) which @ejaronne can help with.

One interesting thing about SARIF is that it supports expressing those mappings as SARIF files (which only hold this kind of classification/mapping data, which SARIF calls 'taxonomies'). These mappings can also be referenced within a SARIF log indirectly via a URL (reflecting the fact that this data might be maintained by someone other than the tool provider). So, a SARIF file could reference a remote description of CWEs, or your NIST codes or any other organizational schema and then decorate its own rules with statements like 'this rule id maps directly to XXX'. GrammaTech contributed this feature based on its prior work and so it has some sophistication to it. E.g., you can say, 'this rule of my tool is a superset of this other taxonomy's rule XXX' or simply note that two things relate. All of this is to say that what you are capturing in CSV here might be useful to formalize in a subset of SARIF JSON that is published on the web. The spec examples are mostly oriented around Mitre's CWE schema.

Yes, MITRE maintains the CWE and CVE databases, and we could enguage that team to help us with the alignment. I have also talked to them before about putting the 800-53 mapping directly in those data sources so perhaps we can circle back around to that.

This feature allows direct SARIF producers to emit relevant mappings such that we could perform the SARIF->HDF conversion strictly from the data in the SARIF log. Without it, someone would need to maintain this mapping externally, as you appear to do today, in whatever format is appropriate (CSV or web-hosted SARIF).

This is likely getting into more detail on a single topic than serves this thread, though. :) Maybe I can take a day or two to continue to explore and if I could get on a call with an appropriate audience, we can plan a path forward. ?

So the breakdown in my thought would be

phase 0: Control mappings, data element mapping between the formats and first cut HDF output phase 1: inverse mappings and data element apignment, then mapping from HDF to SARIF - informed by phase 0 phase 2: itteration and adjustment on phase 0 and 1

In addition, we have to continue the conversation of user communication of the 'relationships and buckets' of what this data shows and informs, and what it can't.

What do we think?

shaopeng-gh commented 3 years ago

Update, I have create a PR to Heimdall repo, with the code for "SARIF --> HDF converter". https://github.com/mitre/heimdall_tools/pull/93

the basic test I have done is using the sample Flawfinder CSV, to convert to SARIF, and then use this new tool to convert the SARIF to HDF, the result HDF file can be loaded in https://heimdall-lite.mitre.org/

eddynaka commented 3 years ago

Hello,

just a quick update:

  1. we implemented the SARIF->HDF converter in the heimdall repository:

  2. we implemented the HDF->SARIF converter in the sarif-sdk repository:

With those points above, we implemented the complete HDF -> SARIF and SARIF -> HDF.

aaronlippold commented 3 years ago

In the documentation on Heimdall Tools - I think it would be good to reference your HDF->Sarif as well to make sure end-users know where to go for each direction. I would say the same may be useful in your sarif-sdk docs as well.

In your Sarif->HDF conversation, I see we have check text - do we have a way to also add fix text? Aka - I have some failed tests here ... now what do I do as an end user?

Looking great all. When is our next conversation?

Thanks,


Aaron Lippold

@.***

260-255-4779

twitter/aim/yahoo,etc. 'aaronlippold'

On Thu, Jun 10, 2021 at 7:14 AM Eddy Nakamura @.***> wrote:

Hello,

just a quick update:

  1. we implemented the SARIF->HDF converter in the heimdall repository:

  2. we implemented the HDF->SARIF converter in the sarif-sdk repository:

With those points above, we implemented the complete HDF -> SARIF and SARIF -> HDF.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/microsoft/sarif-sdk/issues/2286#issuecomment-858533286, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALK42HOYNAF5EA6BWSMVODTSCNAXANCNFSM4XLXNOIA .

eddynaka commented 3 years ago

Hi @aaronlippold ,

(1) we added in the README just like the other tools:

sarif_mapper - static analysis results interchange format

(2) for the sarif-sdk, I will check. (3) I didn't understand your point in check text vs fix text. Can you explain?

thanks for the reply :)

Bialogs commented 3 years ago

@eddynaka He means linking to the HDF->SARIF converter in this repo from HDF README. I will take care of it.

michaelcfanning commented 3 years ago

Great progress, everyone, and we've roughly knocked out our early proposed work. I wonder, should we get on a call and discuss how to build on it? I'd be glad to set that up.

aaronlippold commented 3 years ago

Hi

For (3) we have some 'check text' aka what we looked for in this test. We have that in the converter. What we tested for is clear.

For 'tix text' - to fix this - aka to make this test pass - we suggest "you edit this file or turn off this setting" or something like that. That would be 'fix test'

Not sure if it is possible, but it sure is useful to the end-user.

What do you think?


Aaron Lippold

@.***

260-255-4779

twitter/aim/yahoo,etc. 'aaronlippold'

On Thu, Jun 10, 2021 at 8:53 AM Eddy Nakamura @.***> wrote:

Hi @aaronlippold https://github.com/aaronlippold ,

(1) we added in the README just like the other tools:

sarif_mapper - static analysis results interchange format

(2) for the sarif-sdk, I will check. (3) I didn't understand your point in check text vs fix text. Can you explain?

thanks for the reply :)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/microsoft/sarif-sdk/issues/2286#issuecomment-858596197, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALK42GVDIMVLTBAPZCTVGLTSCYWJANCNFSM4XLXNOIA .

aaronlippold commented 3 years ago

Agreed. Let's see if we can setup another call. I will be out of the office next week but I am open to any time starting the week of the 21st.

Aaron Lippold

@.***

260-255-4779

twitter/aim/yahoo,etc. 'aaronlippold'

On Thu, Jun 10, 2021 at 11:21 AM Michael C. Fanning < @.***> wrote:

Great progress, everyone, and we've roughly knocked out our early proposed work. I wonder, should we get on a call and discuss how to build on it? I'd be glad to set that up.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/microsoft/sarif-sdk/issues/2286#issuecomment-858714803, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALK42BHCWWO6KQGC7WPHOLTSDJ6NANCNFSM4XLXNOIA .

eddynaka commented 3 years ago

Hello,

thanks for everyone's time! Below, the workflow converting HDF->SARIF->Upload to GitHub: https://github.com/eddynaka/hdf-sarif-github/blob/main/.github/workflows/hdf-to-github.yml

candrews commented 1 year ago

Below, the workflow converting HDF->SARIF->Upload to GitHub: https://github.com/eddynaka/hdf-sarif-github/blob/main/.github/workflows/hdf-to-github.yml

This link is broken :(

How does this workflow work? I'd really like to do HDF->SARIF->Upload to GitHub, but I've been unable to find out how to do so.

aaronlippold commented 1 year ago

I'm not sure where that code ran off to, but the SAF CLI tool should still have the conversion of an HDF to SARIF.

We also have a SAF cli gh action

The actual upload into GitHub advanced security would be something we'd likely have to work on together to basically find the right API push

Hopefully the author will respond back and safe us the effort :-)

yongyan-gh commented 1 year ago

You can add below steps to GitHub workflow to convert HDF to SARIF and upload to GHAS.

  1. install SARIF Multitool (CLI)
      - name: Install Sarif Multitool package
        run: dotnet tool install --global Sarif.Multitool
  2. Run SARIF Multitool to convert HDF log to SARIF log:
      - name: Convert HDF to SARIF
        run: sarif convert <HDF_LOG_FILE> -tool Hdf -output converted.sarif
  3. Upload the SARIF log to GHAS:
      - name: Upload SARIF log
        uses: actions/upload-artifact@v3
        with:
          name: converted.sarif
          path: converted.sarif

Please lets know if any question

aaronlippold commented 1 year ago

We should add this to our docs as well to help close the loop for both directions

On Fri, Mar 3, 2023 at 14:36 Yong Yan @.***> wrote:

You can add below steps to GitHub workflow to convert HDF to SARIF and upload to GHAS.

  1. install SARIF Multitool (CLI)

    • name: Install Sarif Multitool package run: dotnet tool install --global Sarif.Multitool
  2. Run SARIF Multitool to convert HDF log to SARIF log:

    • name: Convert HDF to SARIF run: sarif convert -tool Hdf -output converted.sarif
  3. Upload the SARIF log to GHAS:

    • name: Upload SARIF log uses: @.*** with: name: converted.sarif path: converted.sarif

Please lets know if any question

— Reply to this email directly, view it on GitHub https://github.com/microsoft/sarif-sdk/issues/2286#issuecomment-1454035835, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALK42D2NYNXUGODNHEGZ2DW2JB4RANCNFSM4XLXNOIA . You are receiving this because you were mentioned.Message ID: @.***>

--

Aaron Lippold

@.***

260-255-4779

twitter/aim/yahoo,etc. 'aaronlippold'

candrews commented 1 year ago

I implemented @yongyan-gh's approach in https://github.com/microsoft/sarif-sdk/issues/2286#issuecomment-1454035835 and found that it comes close, but unfortunately doesn't work.

The SARIF is generated and send to GitHub, but GitHub fails to parse due to missing location data (which it requires):

Error: Code Scanning could not process the submitted SARIF file:
locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location
Error: Code Scanning could not process the submitted SARIF file:
locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location

-- https://github.com/candrews/jumpstart/actions/runs/5603707977/jobs/10250839746?pr=884#step:10:22

GitHub indicates this requirement in their documentation at https://docs.github.com/en/code-security/code-scanning/integrating-with-code-scanning/sarif-support-for-code-scanning#physicallocation-object

Trivy had the same problem a while back (see https://github.com/aquasecurity/trivy/issues/1038), they solved it by add location/region information to the SARIF: https://github.com/AndreyLevchenko/trivy/commit/a8ec7ec6d7584a8388c1e18db03969b3bb5fb13a

Perhaps this tool could similarly add this information when it converts HDF->SARIF?