opencybersecurityalliance / PACE

Posture Attribute Collection and Evaluation
Other
23 stars 4 forks source link

What "query language(s)" should be used for PACE? #42

Closed sparrell closed 2 years ago

sparrell commented 2 years ago

At the 14-Mar-2022 PACE project meeting, one participant made an assumption that the query language for PACE would be qraphql. This issue is to open the discussion on the different possibilities and decide which one(s) for PACE to use specifically for the PACE OpenC2 interface. There are separate issues for whether there can be more than one type or instance of database, and whether the database(s) can be directly accessed using their native query capabilities.

Possibilities include:

Some of these choices assume particular answers to the other issues (eg SQL presumes a relational database, cyber presumes a labeled property graph, SparQL presumes a RDF graph, GraphQL is for API's, etc).

This issue is related to issues #43 and #44

sparrell commented 2 years ago

My opinion is we need more study/examples/trials before we pick one. My gut would be to try to make STIX patterning language work; and they create lycans to convert to each of the db query languages.

davaya commented 2 years ago

1) As the "participant", I was more focused on the use cases of: a) Decisionmaker queries the Component for Component's SBOMs using OpenC2, vs b) Decisionmaker queries the PAR to discover what SBOMs have already been queried, then the PCS if the desired SBOMs aren't yet available in the PAR.

I don't think we should "pick a winner" on either the DM query sequence diagram or the interface to the PAR, except that we should NOT write an OpenC2 PAR actuator profile to attempt to replicate the many database query languages already in use.

I'm not an expert on STIX patterning, but I'm not sure that is an appropriate database query language either. Other than that, I don't have a preference among database or query APIs, another of which could be a simple HTTP GET. For the CAW we need to crawl with something, but different participants could choose different approaches as long as we have decided that we will define a PAR interface (as opposed to exclusively using OpenC2).

I think GraphQL has some attractive advantages in 1) allowing Producers with no prior knowledge to discover what the PAR contains (introspection), and 2) avoiding over-querying and under-querying as is common with REST APIs. But picking a winner can wait until after several CAWs have explored several options.

Note that GraphQL is an API, not a database (see Unified Data Access tab under Use Cases), and that GraphQL is inherently based on a common schema plus "resolvers" to various backend databases and data sources. GraphQL is analogous to REST - it is an interface design pattern/approach, not a product.

adammontville commented 2 years ago

If we’re using a message fabric, why do we have to worry about this? If we’re using a message fabric, then we need to specify the pub/sub interface and then an implementation for a given fabric (i.e. DXL, Kafka, something else, etc.). The implementer of a PACE-conformant PAR would need to interface with the message fabric, and they can make whatever choice they want for 1) the actual data store, and 2) the way they interface with that.

Thoughts?

Kind regards,

Adam

On Mar 16, 2022, at 9:58 AM, Duncan Sparrell @.***> wrote:

At the 14-Mar-2022 PACE project meeting, one participant made an assumption that the query language for PACE would be qraphql. This issue is to open the discussion on the different possibilities and decide which one(s) for PACE to use specifically for the PACE OpenC2 interface. There are separate issues for whether there can be more than one type or instance of database, and whether the database(s) can be directly accessed using their native query capabilities.

Possibilities include:

STIX patterning language - https://newcontext.com/stix-patterning-quick-reference-card/ https://newcontext.com/stix-patterning-quick-reference-card/ SQL - https://en.wikipedia.org/wiki/SQL https://en.wikipedia.org/wiki/SQL Cypher - https://opencypher.org/ https://opencypher.org/ GraphQL - https://graphql.org/ https://graphql.org/ Gremlin - https://en.wikipedia.org/wiki/Gremlin_(query_language) https://en.wikipedia.org/wiki/Gremlin_(query_language) SparQL - https://en.wikipedia.org/wiki/SPARQL https://en.wikipedia.org/wiki/SPARQL GQL - https://en.wikipedia.org/wiki/Graph_Query_Language https://en.wikipedia.org/wiki/Graph_Query_Language Some of these choices assume particular answers to the other issues (eg SQL presumes a relational database, cyber presumes a labeled property graph, SparQL presumes a RDF graph, GraphQL is for API's, etc).

This issue is related to issues #43 https://github.com/opencybersecurityalliance/PACE/issues/43 and #44 https://github.com/opencybersecurityalliance/PACE/issues/44 — Reply to this email directly, view it on GitHub https://github.com/opencybersecurityalliance/PACE/issues/42, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQVELHMCXT362Q4M3TOJMTVAHZINANCNFSM5Q4EQIUQ. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you are subscribed to this thread.

davaya commented 2 years ago

Also discussed in #36.

The two message fabrics currently addressed by OpenC2 transfer specs are 1) point-to-point using HTTP and 2) pub/sub using MQTT.

HTTP has interface "styles" including static web page HEAD/GET, REST style using GET/POST/PUT/PATCH/DELETE to a bunch of different URLs, and GraphQL style using GET/POST to a single URL.

Within static pages, REST-style multi-URLs, and GraphQL-style single-URLs there are all sorts of media types, including HTML and images, JSON and XML data (including OpenC2 commands), and javascript code.

Many layers need to be specified in a prototype, and in an interface specification, to allow interoperability. https://aws.amazon.com/blogs/mobile/iot-with-aws-appsync/ describes an end-to-end use case for sensor IoT devices connecting via IoT Core (MQTT) and being monitored by a user client (in this case a phone, but could be a browser on a PC) over AppSync (GraphQL). No control functions are involved, but if the IoT thing were a light bulb instead of a thermometer, the phone/browser could use OpenC2 over HTTP using REST style or GraphQL style.

In other words, picking a message fabric is only one of several necessary steps for either prototyping or interface standardization. Duncan's various PAR interface options (may) have various hardwired stack choices. OpenC2's philosophy is to be fabric agnostic; the PAR philosophy should be as well.

davaya commented 2 years ago

As noted at the 3/21 PACE meeting, the items on the list are not peers. In particular, GraphQL is not a database or a query language, it is an interface style analogous to REST. REST uses several different HTTP methods (POST, GET, PUT, PATCH, and DELETE) to a bunch of different URLs. GraphQL uses three actions (query, mutate, delete) to a single URL. And as described in the IoT use case, GraphQL can also be used over pub/sub.

I ran across an article that gives a flavor of the GraphQL approach: https://medium.com/thundra/better-iot-with-graphql-and-appsync-c3617d5c02d0

slarchacki22 commented 2 years ago

Added to FAQ at the 4/25/2022 PACE meeting