guacsec / guac

GUAC aggregates software security metadata into a high fidelity graph database.
https://guac.sh
Apache License 2.0
1.25k stars 162 forks source link

[question] I want to add criticality score input format, can I request a merge request? #1140

Open leomon-999 opened 1 year ago

leomon-999 commented 1 year ago

Hi Professor, I would like to add an input format such as Criticality Score to Setup+Demo to inform the selection of tools for subsequent projects. So can I request to submit a merge request?

url for Criticality Score :

https://github.com/ossf/criticality_score the content of json file: {"default_score":"0.26270","legacy":{"closed_issues_count":0,"commit_frequency":0,"contributor_count":24,"created_since":32,"github_mention_count":0,"issue_comment_frequency":0,"org_count":0,"recent_release_count":6,"updated_issues_count":0,"updated_since":12},"repo":{"created_at":"2020-08-25T10:59:23Z","language":"C++","license":"Apache License 2.0","star_count":0,"updated_at":"2022-04-27T14:09:35Z","url":"https://github.com/laiyoufafa/aafwk_aafwk_lite"}}

pxp928 commented 1 year ago

Thanks for opening the question @leomon-999. Yes, we are open to adding Criticality Score into GUAC. Based on the output above, we could map this into the existing scorecard nodes that we generate. We could make the scorecard schema more generic to support both scores. The other option is to create a new schema for Criticality Score. I am leaning towards the latter (creating a new schema for this). cc @jeffmendoza @mihaimaruseac @lumjjb

mihaimaruseac commented 1 year ago

I'm +1 on adding a new set of GraphQL definitions for this

pxp928 commented 1 year ago

@leomon-999 could you elaborate on the use case for Criticality Score? I am trying to understand how it can be used within the context of GUAC.

leomon-999 commented 1 year ago

@pxp928 There are no Criticality Score use cases at this time, it mainly checks the commit_frequency, contributor_count, org_count, star_count, issue_comment_frequency, recent_release_count, etc. It may not be related to scorecard, slsa, etc. It may be a separate node. Our purpose is to add the Criticality Score node to provide an additional technology selection reference for open source community building.

leomon-999 commented 1 year ago

I'm +1 on adding a new set of GraphQL definitions for this

Does that mean you're already doing something about GUAC being able to support criticality score as an input format?

lumjjb commented 1 year ago

In general I am supportive of having criticality information! We should be able to do this today with the HasMetadata Predicate (https://github.com/guacsec/guac/blob/main/pkg/assembler/graphql/schema/metadata.graphql#L36-L45). This should allow you to encode all the fields you mention on any source, package or artifact!

I think if there's a reason that we need to separate it to its own graphQL definition (performance, ease of analysis, etc.), then we'd add a graphql definition.

@leomon-999 does the HasMetadata field work for you to encode that information?

lumjjb commented 1 year ago

In that case the ask here would be to add a parser for OSSF criticality score json documents? is that something that you think you'd like to contribute to?

leomon-999 commented 1 year ago

Currently I am in guac version 0.0.1 this version of the modification, so that guac can support criticality score this input format, after testing the results can also be derived, in the neo4j client query shows criticality score is a separate node, This node is not associated with scorecard, slsa, etc. So for the latest version of guac now, I may need to look into it a bit more.

leomon-999 commented 1 year ago

Does the latest version of guac still support displaying knowledge graphs using the neo4j client?

pxp928 commented 1 year ago

Does the latest version of guac still support displaying knowledge graphs using the neo4j client?

Hey @leomon-999, so the neo4j backend has been discontinued for the time being but you can still visualize via our guac-visualizer. GUAC is now database agnostic and does not rely solely on neo4j anymore.

Currently I am in guac version 0.0.1 this version of the modification, so that guac can support criticality score this input format, after testing the results can also be derived, in the neo4j client query shows criticality score is a separate node, This node is not associated with scorecard, slsa, etc. So for the latest version of guac now, I may need to look into it a bit more.

Yes since the v0.0.1 release, there have been significant changes to the code base. Take a look at what @lumjjb said above and let us know if we can help.

nathannaveen commented 1 year ago

I would like to work on this issue.

I also have some questions regarding this:

pxp928 commented 1 year ago

Criticality score outputs the results as a CSV file, how do we want to ingest criticality score data? Do we want to ask criticality score to make it as a public API so we can consume it rather than using the CSV?

Hey @nathannaveen, based on @leomon-999 message above, it does output in json. So that would be the preferred method for ingestion. If there is already data that is stored and available via an API, we can expand to that in the future.

A question from my ignorance: Does guac intend to run the criticality scorer, or just consume the data?

Only consume the data, we would not be running the criticality scorer. If the users has the data, we would parse and ingest it.

We should also to capture the weights that were used to calculate the criticality score?

hmm is that something that would be useful for the user? Meaning would there be a usecase where they (or policy engine) would need to know the weights in order to make a certain decision?

pxp928 commented 1 year ago

@leomon-999 Are you actively working on this? Just wanted to make sure that we don't step on work you already started.

leomon-999 commented 1 year ago

@leomon-999 Are you actively working on this? Just wanted to make sure that we don't step on work you already started.

Yes,But I'm just starting to learn about this latest version, and of course, it would be better if you were the professionals about this.

nathannaveen commented 1 year ago

Yes,But I'm just starting to learn about this latest version, and of course, it would be better if you were the professionals about this.

@leomon-999 I am not exactly sure what you mean, would you like me to take over this task?

nathannaveen commented 1 year ago

Hey @nathannaveen, based on @leomon-999 message above, it does output in json. So that would be the preferred method for ingestion. If there is already data that is stored and available via an API, we can expand to that in the future.

Oh thanks! I used to only run it with a CSV, I forgot that it can output a JSON as well.

Only consume the data, we would not be running the criticality scorer. If the users has the data, we would parse and ingest it.

Ok, good to know!

hmm is that something that would be useful for the user? Meaning would there be a usecase where they (or policy engine) would need to know the weights in order to make a certain decision?

I think that capturing the weights would be useful because then people can verify the validity of the score, and see what it is based on.

leomon-999 commented 1 year ago

@leomon-999 I am not exactly sure what you mean, would you like me to take over this task?

I'm sorry I didn't make myself clear. Since I was working on GUAC version 0.0.1, but now GUAC has changed a lot, so now I need to work on the latest version of GUAC to get it to support criticality score, and would like to know if you're doing this too, if you're already working on it, I think it must be more professional and useful than me, if not, I will continue to do this too.

nathannaveen commented 1 year ago

@leomon-999 I haven't started to work on this issue.

leomon-999 commented 11 months ago

Hi Professor, I would like to confirm with you that guac-visualizer is compatible with multiple databases, right, and what are its advantages?

pxp928 commented 11 months ago

Hi Professor, I would like to confirm with you that guac-visualizer is compatible with multiple databases, right, and what are its advantages?

Yes, the guac-visualizer utilizes the GraphQL APIs to communicate with GUAC so it is database agnostic.

mihaimaruseac commented 11 months ago

Please also see https://docs.google.com/document/d/1yZ3-ZcfnRDWgw9uZlPuLmIHS9pNMr3DO_AEbHsDXmN8/edit for the design of the GraphQL interface to support multiple backends.