obi1kenobi / trustfall

A query engine for any combination of data sources. Query your files and APIs as if they were databases!
Apache License 2.0
2.3k stars 66 forks source link

If I don't have a schema, how do I execute the query? #561

Closed ZimboPro closed 4 months ago

ZimboPro commented 4 months ago

Firstly, thank you so much for the library. I am really enjoying and what to introduce it as a custom linting tool at work (as well as Rust).

I have worked through the HN adapter and did manage to create a (somewhat limited) OpenAPI adapter. I am busy creating one for Terraform files. To make it easier, I am converting the files into the JSON equivalent. I used the BasicAdapter but when I query I get the following error The referenced path does not exist in the schema: ["resource"]

I did get inspiration from https://github.com/felipesere/trustfall-serde-yaml for the implementation

The code for the adapter is:

use serde_json::Value;
use std::sync::Arc;
use trustfall::{
    provider::{AsVertex, BasicAdapter},
    FieldValue,
};

#[derive(Debug, Clone)]
pub struct HclVertex(Arc<Value>);

impl From<Value> for HclVertex {
    fn from(value: Value) -> Self {
        Self(Arc::new(value.clone()))
    }
}

impl From<&Value> for HclVertex {
    fn from(value: &Value) -> Self {
        Self(Arc::new(value.clone()))
    }
}

impl trustfall::provider::Typename for HclVertex {
    fn typename(&self) -> &'static str {
        match *self.0 {
            Value::Null => "null",
            Value::Bool(_) => "bool",
            Value::Number(_) => "number",
            Value::String(_) => "string",
            Value::Array(_) => "array",
            Value::Object(_) => "object",
        }
    }
}

pub struct JsonAdapter {
    root: Arc<Vec<Value>>,
}

impl JsonAdapter {
    pub fn new(json: Vec<Value>) -> Self {
        Self {
            root: Arc::new(json),
        }
    }

    fn get_value(&self, key: &str) -> Vec<Value> {
        let iter = self
            .root
            .iter()
            .filter_map(|file_value| file_value.get(key).clone().to_owned());
        iter.map(|x| x.to_owned()).collect()
    }
}

impl<'vertex> BasicAdapter<'vertex> for JsonAdapter {
    type Vertex = HclVertex;

    fn resolve_starting_vertices(
        &self,
        edge_name: &str,
        parameters: &trustfall::provider::EdgeParameters,
    ) -> trustfall::provider::VertexIterator<'vertex, Self::Vertex> {
        match edge_name.clone() {
            "Module" => {
                let items = self
                    .get_value("module")
                    .into_iter()
                    .map(|x| HclVertex(Arc::new(x.clone())));
                Box::new(items)
            }
            "Resource" => {
                println!("Search");
                let items = self
                    .get_value("resource")
                    .into_iter()
                    .map(|x| HclVertex(Arc::new(x.clone())));
                Box::new(items)
            }
            "Data" => {
                let items = self
                    .get_value("data")
                    .into_iter()
                    .map(|x| HclVertex(Arc::new(x.clone())));
                Box::new(items)
            }
            "Variable" => {
                let items = self
                    .get_value("variable")
                    .into_iter()
                    .map(|x| HclVertex(Arc::new(x.clone())));
                Box::new(items)
            }
            "Locals" => {
                let items = self
                    .get_value("locals")
                    .into_iter()
                    .map(|x| HclVertex(Arc::new(x.clone())));
                Box::new(items)
            }
            "Output" => {
                let items = self
                    .get_value("output")
                    .into_iter()
                    .map(|x| HclVertex(Arc::new(x.clone())));
                Box::new(items)
            }
            "Provider" => {
                let items = self
                    .get_value("provider")
                    .into_iter()
                    .map(|x| HclVertex(Arc::new(x.clone())));
                Box::new(items)
            }
            "Terraform" => {
                let items = self
                    .get_value("terraform")
                    .into_iter()
                    .map(|x| HclVertex(Arc::new(x.clone())));
                Box::new(items)
            }
            x => {
                let items = self
                    .get_value(x)
                    .into_iter()
                    .map(|x| HclVertex(Arc::new(x.clone())));
                Box::new(items)
            }
        }
    }

    fn resolve_property<V: trustfall::provider::AsVertex<Self::Vertex> + 'vertex>(
        &self,
        contexts: trustfall::provider::ContextIterator<'vertex, V>,
        type_name: &str,
        property_name: &str,
    ) -> trustfall::provider::ContextOutcomeIterator<'vertex, V, trustfall::FieldValue> {
        let property_name = property_name.to_string();
        let type_name = type_name.to_string();
        Box::new(contexts.filter_map(move |ctx| {
            let node = ctx.active_vertex().clone().unwrap();

            println!("Looking for {property_name} of type {type_name} on {node:?}:");
            node.0
                .get(&property_name)
                .and_then(|v| v.as_str())
                .map(|v| (ctx.clone(), FieldValue::from(v)))
        }))
    }

    fn resolve_neighbors<V: trustfall::provider::AsVertex<Self::Vertex> + 'vertex>(
        &self,
        contexts: trustfall::provider::ContextIterator<'vertex, V>,
        type_name: &str,
        edge_name: &str,
        parameters: &trustfall::provider::EdgeParameters,
    ) -> trustfall::provider::ContextOutcomeIterator<
        'vertex,
        V,
        trustfall::provider::VertexIterator<'vertex, Self::Vertex>,
    > {
        let edge_name = edge_name.to_string();
        Box::new(contexts.filter_map(move |context| {
            let edge_name = edge_name.clone();
            let active = context.active_vertex().unwrap().clone();

            if edge_name == "*" && active.0.is_array() {
                let children: Vec<_> = active
                    .0
                    .as_array()
                    .unwrap()
                    .into_iter()
                    .map(|v| HclVertex::from(v))
                    .collect();

                return Some((context, Box::new(children.into_iter()) as Box<_>));
            }

            if let Some(value) = active.0.get(edge_name) {
                return Some((
                    context.clone(),
                    Box::new(std::iter::once(HclVertex::from(value))) as Box<_>,
                ));
            }

            None
        }))
    }

    fn resolve_coercion<V: trustfall::provider::AsVertex<Self::Vertex> + 'vertex>(
        &self,
        contexts: trustfall::provider::ContextIterator<'vertex, V>,
        type_name: &str,
        coerce_to_type: &str,
    ) -> trustfall::provider::ContextOutcomeIterator<'vertex, V, bool> {
        todo!()
    }
}

The reason it accepts a Vec is because the Terraform files gets converted into JSON for each file, and not merged.

The query used was

InputQuery (
    query: r#"
{
    resource {
        aws_api_gateway_stage {
            this {
                stage_name @output
            }
        }
    }
}"#,
    args: {},
)

The file used was

resource "aws_api_gateway_stage" "this" {
  stage_name    = var.stage_name
  description   = "${var.comment_prefix}${var.api_domain}"
  rest_api_id   = aws_api_gateway_rest_api.this.id
  deployment_id = aws_api_gateway_deployment.this.id
  tags          = var.tags
}
...

and the JSON equivalent is

{
  "resource": {
    "aws_api_gateway_stage": {
      "this": {
        "deployment_id": "${aws_api_gateway_deployment.this.id}",
        "description": "${var.comment_prefix}${var.api_domain}",
        "rest_api_id": "${aws_api_gateway_rest_api.this.id}",
        "stage_name": "${var.stage_name}",
        "tags": "${var.tags}"
      }
    },
...
}
obi1kenobi commented 4 months ago

Our error messages could use some work πŸ˜… Thanks for powering through it.

All names in Trustfall schemas and queries are case-sensitive. This error intends to say that your query starts with { resource (meaning that it looks for an entry point called resource) but no such entry point exists β€” based on your resolve_starting_vertices() implementation it seems like it might be called Resource instead. Try capitalizing it in your query and the error should hopefully go away?

I'd love to hear about your work on the custom linter you're building β€” it sounds very cool!

ZimboPro commented 4 months ago

I just realised the issue, because I was copying and pasting, I used the OpenAPI schema :rofl:

So the issue should be updated to, if I don't have a schema, how do I execute the query? Is it even possible to represent JSON in a GraphQL schema?

The linter is to check Terraform config files and OpenAPI docs to ensure that is correctly set up and the cross file type config is correct. I might extend it to Python as well if using FastAPI.

[EDIT] typos

obi1kenobi commented 4 months ago

The linter is to check Terraform config files and OpenAPI docs to ensure that is correctly set up and the cross file type config is correct. I might extend it to Python as well if using FastAPI.

Oooh this sounds very cool! I'd love to learn more β€” might you be up for a virtual coffee chat sometime next week?

Is it even possible to represent JSON in a GraphQL schema?

In principle, yes. In fact, in principle it's always possible to write a schema for any data source β€” even if it's just raw binary data. But depending on your exact use case, that schema might not be useful, ergonomic, sufficiently expressive, etc.

For example, here's one valid way to represent arbitrary JSON in a Trustfall schema:

type Json {
    """Contents of this JSON document, as a JSON serialized string."""
    contents: String!
}

Here's another one:

type Json {
    """Contents of this JSON document, as a JSON serialized string."""
    contents: String!

    """Any child elements contained inside this JSON document, in unspecified order."""
    child_elements: [Json!]
}

Here's yet another one:

interface Json {
    contents: String!
}

type JsonNumber implements Json {
    contents: String!
    value: Float!
}

type JsonString implements Json {
    contents: String!
    value: String!
}

type JsonArray implements Json {
    contents: String!

    values: [Json!]!
}

type JsonDict implements Json {
    contents: String!

    key_values: [JsonKeyValue!]
}

type JsonKeyValue {
    key: Json
    value: Json
}

There are infinitely more schemas that are all valid ways to represent JSON. But not all of them are good nor useful. Data modelling is hard, and writing high-quality Trustfall schemas is no different β€” it requires practice!

The best way to model your data is a function of how you intend to use it β€” what queries you need to run, which pieces of detail are crucial to expose vs safe to ignore, etc. So I usually recommend starting with the queries you want to run, setting up a minimal schema that would make those queries tidy and ergonomic, and then slowly working to expand that schema.

I wrote up a few more tips in a comment here.

ZimboPro commented 4 months ago

Oooh this sounds very cool! I'd love to learn more β€” might you be up for a virtual coffee chat sometime next week? Definitely keen for a virtual coffee

Thanks for the advice, will talk to my team about what we want to query in the Terraform files

ZimboPro commented 4 months ago

Closing ticket since it is not a bug but an implementation issue from my end

obi1kenobi commented 4 months ago

Thanks for the advice, will talk to my team about what we want to query in the Terraform files

Would love to hear what they say! Thanks!

ZimboPro commented 3 months ago

They loved the tool, they just want it to be a bit more extensible since I am the only one with knowledge of Rust currently. So want to see if I can have the adapters load as WASM plugins to allow it to be a lot more flexible