oxidecomputer / typify

compiler from JSON Schema into idiomatic Rust types
Apache License 2.0
443 stars 59 forks source link

Add support for external $ref #201

Open samueleresca opened 1 year ago

samueleresca commented 1 year ago

Hey,

Thanks for this crate.

I was using it to generate some code based on the following NVD schema: https://csrc.nist.gov/schema/nvd/api/2.0/cve_api_json_2.0.schema

It would be useful to add the support for fetching external $ref coming from the file system or via HTTP:

E.g.:

"cvss-v30": {
...
    "properties": {
        "cvssData": {"$ref": "//csrc.nist.gov/schema/nvd/api/2.0/external/cvss-v3.0.json" },
    },
...
},
ahl commented 1 year ago

Yeah. That would be neat. I don't love the idea of making http requests from the context of a macro invocation... at least by default. I do think it would be reasonable as a (non-default) crate feature or as a capability of a CLI (https://github.com/oxidecomputer/typify/pull/204), perhaps with some non-default flag (e.g. --allow-external-downloads).

In addition, it might be useful to allow for som sort of remapping, e.g. the ability to say "instead of downloading some external $ref from csrc.nist.gov/schema/nvd/api/2.0/external/cvss-v3.0.json, go ahead and use this local file called schemas/cvss-v3.0.json".

I would appreciate any opinions you might want to offer about what interfaces might work well for your use case.

See also: https://github.com/oxidecomputer/progenitor/issues/228

jayvdb commented 1 year ago

Note you can use https://crates.io/crates/schematools as a library to do this. Sample code below. However it is a bit picky about the order in the input, c.f. https://github.com/kstasik/schema-tools/issues/67

fn load_spec(filename: String) {
    let client = reqwest::blocking::Client::new();

    let urls = vec![path_to_url(filename).unwrap()];

    let mut schema = Schema::load_urls_with_client(urls, &client).unwrap();
    let storage = SchemaStorage::new(&schema, &client);

    Dereferencer::options()
        .with_create_internal_references(true)
        .process(&mut schema, &storage);

    let resolved_yaml = serde_yaml::to_string(schema.get_body()).unwrap();

    ...
}
jrcarl624 commented 1 year ago

This would be nice, maybe to start with file-based refs. I have a project that has duplicate types which use in project refs.

grv07 commented 1 year ago

@samueleresca Any workaround/solution you use for this?

Qix- commented 1 year ago

In addition, it might be useful to allow for som sort of remapping, e.g. the ability to say "instead of downloading some external $ref from csrc.nist.gov/schema/nvd/api/2.0/external/cvss-v3.0.json, go ahead and use this local file called schemas/cvss-v3.0.json".

This is exactly what I would hope for and expect. I'm also not a fan of HTTP requests from a macro invocation, as there are tons of privacy, security, and reproducibility issues with that and isn't something we should be encouraging in the ecosystem (in my very subjective opinion).

At the very least, giving us the ability to do this manually by specifying a map in the import_types!{} macro would be a good start. If the decision is later made to add some sort of auto-fetching, it can be supplementary to the manual map.

freexploit commented 8 months ago

I think this affect self referencial types on draft-7. I have some types referencing types on the same document that start with '#' and it fails as external references.

baxterjo commented 7 months ago

How are we doing on this? I saw that a user has forked this repo to try to address this issue, has anyone made any progress? Anything I can do to help? A local file reference would be very useful for my current use case.

kika commented 7 months ago

How are we doing on this?

I have someone on my team working on this along the lines @ahl outlined above (no http, simple URL->filesystem remapping). Hopefully we'll have something working (meaning not failing our own test cases) this week.

ahl commented 6 months ago

How are we doing on this? I saw that a user has forked this repo to try to address this issue, has anyone made any progress? Anything I can do to help? A local file reference would be very useful for my current use case.

This hasn't been a priority for us (i.e. not a use case we need), but I'm generally supportive of having a solution here.

baxterjo commented 6 months ago

I have someone on my team working on this along the lines @ahl outlined above (no http, simple URL->filesystem remapping). Hopefully we'll have something working (meaning not failing our own test cases) this week.

Awesome, let me know if I can support

kika commented 6 months ago

the effort is not abandoned but proved to be more challenging than it seemed at first. although we need this and typify is the best candidate for the job so we'll keep pushing. Stay tuned.

baxterjo commented 6 months ago

I have found (stumbled upon actually) a very hacky solution to this. I'm not gonna lie, this is the first time in a long time I actually have no clue how my code worked because I was just poking around with the library to see what I could get to work.

This solution requires you to know ahead of time exactly which files are going to be used as external references. And then use the schemars lib to visit these refs and dereference them with their actual values. I think what this does is make it so typify does not replace the type in the typespace because it already exists 🤷

Keen observers will notice that this is the AWS IoT Jobs MQTT API

Here are the files with the same paths for reference.

jobs_api.zip

// build.rs
use schemars::schema::SchemaObject;
use schemars::visit::{visit_root_schema, visit_schema_object, Visitor};
use std::collections::HashMap;
use std::io::Result;
use std::{env, fs, path::Path};
use typify::{TypeSpace, TypeSpaceSettings};

const JSON_SCHEMA_FILES: &[&str] = &[
    "jobs_api/api/schemas/job_execution.yaml",
];

const JSON_SCHEMA_INCLUDES: &[&str] = &["jobs_api/"];

const EXTERNAL_REFERENCE_TYPES: &[&str] = &["jobs_api/api/schemas/job_doc.yaml"];

fn main() -> Result<()> {

    let mut ref_resolver = RefResolver {
        include_paths: JSON_SCHEMA_INCLUDES.iter().map(|&s| s.into()).collect(),
    };

    let mut type_space =
        TypeSpace::new(TypeSpaceSettings::default().with_derive("PartialEq".to_string()));

    for ref_schema in EXTERNAL_REFERENCE_TYPES {
        let yaml_string = fs::read_to_string(ref_schema)
            .unwrap_or_else(|_| panic!("Got error while building schema at: {ref_schema}"));
        let schema =
            serde_yaml::from_str::<schemars::schema::RootSchema>(yaml_string.as_str()).unwrap();

        type_space.add_root_schema(schema).unwrap().unwrap();

        type_id_map.insert(ref_schema, id);
    }

    // JSON Schema codegen
    for schema_file in JSON_SCHEMA_FILES {
        let yaml_string = fs::read_to_string(schema_file)
            .unwrap_or_else(|_| panic!("Got error while building schema at: {schema_file}"));
        let mut schema =
            serde_yaml::from_str::<schemars::schema::RootSchema>(yaml_string.as_str()).unwrap();

        ref_resolver.visit_root_schema(&mut schema);

        type_space.add_root_schema(schema).unwrap();
    }

    let contents =
        prettyplease::unparse(&syn::parse2::<syn::File>(type_space.to_stream()).unwrap());

    fs::write(
        format!("{}/jobs_api.rs", &env::var("OUT_DIR").unwrap()),
        contents,
    )
    .unwrap();

    println!("cargo:rerun-if-changed=jobs_api/api");
    println!("cargo:rerun-if-changed=build.rs");

    Ok(())
}

struct RefResolver {
    include_paths: Vec<String>,
}

impl RefResolver {
    fn get_schema_from_reference(&self, reference_path: &str) -> Option<SchemaObject> {
        for include_path in self.include_paths.iter() {
            let path_string = format!("{include_path}{reference_path}");
            let full_path = match Path::new(&path_string).canonicalize() {
                Ok(x) => x,
                Err(_err) => continue,
            };
            let yaml_string = fs::read_to_string(full_path).unwrap();
            if let Ok(x) =
                serde_yaml::from_str::<schemars::schema::SchemaObject>(yaml_string.as_str())
            {
                return Some(x);
            }
        }
        None
    }
}

impl Visitor for RefResolver {
    fn visit_schema_object(&mut self, schema: &mut schemars::schema::SchemaObject) {
        if let Some(reference_path) = schema.reference.clone() {
            if !reference_path.starts_with('#') {
                let new_schema = self
                    .get_schema_from_reference(&reference_path)
                    .unwrap_or_else(|| panic!("Got none when trying to unwrap {reference_path}"));
                *schema = new_schema;
            }
        }
        visit_schema_object(self, schema);
    }
    fn visit_root_schema(&mut self, root: &mut schemars::schema::RootSchema) {
        println!("Visiting: {root:#?}");
        self.visit_schema_object(&mut root.schema);
        visit_root_schema(self, root);
    }
}
kika commented 5 months ago

We've had some success. The progress got delayed by a perceived bug, which turned out to be a problem with our schemas (messily autogenerated), but we have some preliminary code at https://github.com/agily/typify/tree/support-external-references It will take some time to clean this up and write some tests and make it PR-worthy but early adopters can try.

baxterjo commented 4 months ago

Hey @kika, how is this looking? Would love to see this merged in soon! 😃