opencybersecurityalliance / stix-shifter

This project consists of an open source library allowing software to connect to data repositories using STIX Patterning, and return results as STIX Observations.
https://stix-shifter.readthedocs.io
Other
225 stars 232 forks source link

Mongo Aggregations or MapReduce queries #90

Open StephenOTT opened 5 years ago

StephenOTT commented 5 years ago

Would be great (unless i missed it in the docs) for a Mongo Aggregation and/or mapreduce query generation.

JasonKeirstead commented 5 years ago

@StephenOTT Can you go into more detail as to what you're looking for here?

StephenOTT commented 5 years ago

Shifter provides the ability to convert from the stix pattern into another format, such as going from the stix pattern to the elastic search query. Would be great if same could be done but with mongo aggregation query. So you can go from stix pattern to mongo aggregation query. (Mongo aggregation query is just another json object)

JasonKeirstead commented 5 years ago

@StephenOTT First the data format and layout for the security data living in Mongo that we're trying to go against would need to be defined. Shifter doesn't work if it doesn't understand the data... since a Mongo database can contain "anything", this is a problem.

StephenOTT commented 5 years ago

I think that would be fine. You are basically doing the same for elastic indexes?

JasonKeirstead commented 5 years ago

@StephenOTT Kind of, except with Elastic we have some standard schemas to target. MITRE has defined a translation to their CAR schema, and we will also be developing a translation to the ECS standard schema. The goal of Shifter is to work "out of the box" for most security products.

If there is some kind of standard schema for Mongo you have in mind that is in use in a product we would definitely look at this.

pcoccoli commented 4 years ago

I don't know anything about MongoDB, but some data sources have aggregations. For example in QRadar AQL you can GROUP BY and then use an aggregation function (IIUC). In STIX Observations, there is first_observed, last_observed, and number_observed, so it seems like we should be able to handle simple "count" aggregations, at least for data sources that support it.

Supporting such aggregations gives us a way to "push" some of the computational burden down the stack, and reduce the amount of data transmitted.

DerekRushton commented 3 weeks ago

Given the lack of interest in this enhancement I'm considering closing this issue. If there is any interest I'm open to leaving this open, but if not I will close this soon.