near / borsh-rs

Rust implementation of Binary Object Representation Serializer for Hashing
https://borsh.io/
Apache License 2.0
306 stars 67 forks source link

Schema generates conflicting names for enum variants/across crates #92

Open jshaw-jump opened 2 years ago

jshaw-jump commented 2 years ago

Original issue (conflicting names of enum variants with outer structs)

The following code fails to compile because the schema name generation for enum variants can produce name clashes.

#[derive(BorshSchema)]
enum Foo {
    Bar(FooBar),
}

#[derive(BorshSchema)]
struct FooBar {}
error[E0275]: overflow evaluating the requirement `<Foo as borsh::BorshSchema>::add_definitions_recursively::FooBar: borsh::BorshSchema`
 --> code.rs:1:10
  |
2 | #[derive(BorshSchema)]
  |          ^^^^^^^^^^^
  |
  = help: see issue #48214
  = note: this error originates in the derive macro `borsh::BorshSchema` (in Nightly builds, run with -Z macro-backtrace for more info)

I believe the error is a result of using a simple concatenation to generate the name. https://github.com/near/borsh-rs/blob/7325a0aab74049237b9f978e1e2c0fcf7bb799c2/borsh-schema-derive-internal/src/enum_schema.rs#L28


Widened issue (conflicting names for types globally across multiple crates)

As per research by @dzmitry-lahoda, BorshSchema is problematic to use when multiple third-party crates are involved, which can introduce conflicting names and only be panicing with conflicts when tried to be used elsewhere.

use crate_a::A as AA;
use crate_b::A as AB;

#[derive(BorshSchema)]
struct B {
    x: AA,
    y: AB,
}
frol commented 2 years ago

Good catch! Thanks for reporting it, but I believe borsh schema issues won't be addressed at the moment as it is not used in production and was just an experiment without a clear specification

austinabell commented 1 year ago

cc @itegulov @miraclx we might want to consider this issue now that we are using schema for ABI

dzmitry-lahoda commented 5 months ago

what is possible outline of fix for this? I have clash for same named struct in different parts of my big account struct.

dzmitry-lahoda commented 5 months ago

I see next:

So there are some users, may be a lot if search on GH.

cc @frol

dzmitry-lahoda commented 5 months ago

Did quick tests on schemars

TLDR

For same named items schemars generates second declaration/definition under +1 suffixed name.

Long:

See #/definitions/A2

thread 'engine::tests::borsh_diff::serde_name_conflict' panicked at engine/src/engine/tests/borsh_diff.rs:68:5: RootSchema { meta_schema: Some("http://json-schema.org/draft-07/schema#"), schema: SchemaObject { metadata: Some(Metadata { id: None, title: Some("D"), description: None, default: None, deprecated: false, read_only: false, write_only: false, examples: [] }), instance_type: Some(Single(Object)), format: None, enum_values: None, const_value: None, subschemas: None, number: None, string: None, array: None, object: Some(ObjectValidation { max_properties: None, min_properties: None, required: {"b", "c"}, properties: {"b": Object(SchemaObject { metadata: None, instance_type: None, format: None, enum_values: None, const_value: None, subschemas: None, number: None, string: None, array: None, object: None, reference: Some("#/definitions/B"), extensions: {} }), "c": Object(SchemaObject { metadata: None, instance_type: None, format: None, enum_values: None, const_value: None, subschemas: None, number: None, string: None, array: None, object: None, reference: Some("#/definitions/C"), extensions: {} })}, pattern_properties: {}, additional_properties: None, property_names: None }), reference: None, extensions: {} }, definitions: {"A": Object(SchemaObject { metadata: None, instance_type: Some(Single(Object)), format: None, enum_values: None, const_value: None, subschemas: None, number: None, string: None, array: None, object: Some(ObjectValidation { max_properties: None, min_properties: None, required: {"a"}, properties: {"a": Object(SchemaObject { metadata: None, instance_type: Some(Single(Integer)), format: Some("uint8"), enum_values: None, const_value: None, subschemas: None, number: Some(NumberValidation { multiple_of: None, maximum: None, exclusive_maximum: None, minimum: Some(0.0), exclusive_minimum: None }), string: None, array: None, object: None, reference: None, extensions: {} })}, pattern_properties: {}, additional_properties: None, property_names: None }), reference: None, extensions: {} }), "A2": Object(SchemaObject { metadata: None, instance_type: Some(Single(Object)), format: None, enum_values: None, const_value: None, subschemas: None, number: None, string: None, array: None, object: Some(ObjectValidation { max_properties: None, min_properties: None, required: {"a"}, properties: {"a": Object(SchemaObject { metadata: None, instance_type: Some(Single(Integer)), format: Some("uint8"), enum_values: None, const_value: None, subschemas: None, number: Some(NumberValidation { multiple_of: None, maximum: None, exclusive_maximum: None, minimum: Some(0.0), exclusive_minimum: None }), string: None, array: None, object: None, reference: None, extensions: {} })}, pattern_properties: {}, additional_properties: None, property_names: None }), reference: None, extensions: {} }), "B": Object(SchemaObject { metadata: None, instance_type: Some(Single(Object)), format: None, enum_values: None, const_value: None, subschemas: None, number: None, string: None, array: None, object: Some(ObjectValidation { max_properties: None, min_properties: None, required: {"a"}, properties: {"a": Object(SchemaObject { metadata: None, instance_type: None, format: None, enum_values: None, const_value: None, subschemas: None, number: None, string: None, array: None, object: None, reference: Some("#/definitions/A"), extensions: {} })}, pattern_properties: {}, additional_properties: None, property_names: None }), reference: None, extensions: {} }), "C": Object(SchemaObject { metadata: None, instance_type: Some(Single(Object)), format: None, enum_values: None, const_value: None, subschemas: None, number: None, string: None, array: None, object: Some(ObjectValidation { max_properties: None, min_properties: None, required: {"a"}, properties: {"a": Object(SchemaObject { metadata: None, instance_type: None, format: None, enum_values: None, const_value: None, subschemas: None, number: None, string: None, array: None, object: None, reference: Some("#/definitions/A2"), extensions: {} })}, pattern_properties: {}, additional_properties: None, property_names: None }), reference: None, extensions: {} })} }

mod a {
    use borsh::{de, BorshDeserialize, BorshSchema, BorshSerialize};

    #[derive(Default, Debug, BorshSerialize, BorshSchema, arbitrary::Arbitrary, schemars::JsonSchema, serde::Serialize)]
    pub  struct A {
        a: u8,
    }

    #[derive(Default, Debug, BorshSerialize, BorshSchema, arbitrary::Arbitrary, schemars::JsonSchema, serde::Serialize)]
    pub struct  B {
        a: A,
    }
}

mod b {
    use borsh::{de, BorshDeserialize, BorshSchema, BorshSerialize};

    #[derive(Default, Debug, BorshSerialize, BorshSchema, arbitrary::Arbitrary, schemars::JsonSchema, serde::Serialize)]
    pub  struct A {
        a: u8,
    }

    #[derive(Default, Debug, BorshSerialize, BorshSchema, arbitrary::Arbitrary, schemars::JsonSchema, serde::Serialize)]
    pub struct  C {
        a: A,
    }
}

use a::B;
use b::C;
#[derive(Default, Debug, BorshSerialize, BorshSchema, arbitrary::Arbitrary, schemars::JsonSchema, serde::Serialize)]
struct D {
    b: B,
    c: C,
}

#[test]
fn serde_name_conflict() {
    let schema = schemars::gen::SchemaGenerator::default()
    .into_root_schema_for::<D>();
    panic!("{:?}", schema);
}
dzmitry-lahoda commented 5 months ago

@frol @itegulov @miraclx @austinabell

would PR which does exact same bump of number suffix in declaration accepted as fix as in schemars?

solution

So for type A and A in definitions, second one will become A2

possible improvement

alternative

Prefix definition with parent struct, so if A and A in B and C, name of second one will be B_A, which will require need to propagate parent info (hopefully it is already).

In case A and A in same struct one will have alias or mod prefix, so use alias or mod prefix.

Alternative seems complicated.

Not sure if one can get full Rust compile time info about modules in proc macro.

frol commented 4 months ago

I don’t see a universal solution to avoid naming collisions when names are autogenerated. The most flexible solution is to introduce macro attribute to control/override the naming.

@dzmitry-lahoda Would you like to contribute the solution to allow naming customization?

dj8yfo commented 4 months ago

@dzmitry-lahoda there's some solution for simple cases without deep nesting of conflicting types

diff --git a/borsh-derive/src/internals/attributes/field/schema.rs b/borsh-derive/src/internals/attributes/field/schema.rs
index 01353bf1..af632f93 100644
--- a/borsh-derive/src/internals/attributes/field/schema.rs
+++ b/borsh-derive/src/internals/attributes/field/schema.rs
@@ -24,31 +24,31 @@ pub static SCHEMA_FIELD_PARSE_MAP: Lazy<BTreeMap<Symbol, Box<ParseFn>>> = Lazy::
     // assigning closure `let f = |args| {...};` and boxing closure `let f: Box<ParseFn> = Box::new(f);`
     // on 2 separate lines doesn't work
     let f_params: Box<ParseFn> = Box::new(|attr_name, meta_item_name, meta| {
         parse_lit_into_vec::<ParameterOverride>(attr_name, meta_item_name, meta)
             .map(Variants::Params)
     });

     let f_with_funcs: Box<ParseFn> = Box::new(|_attr_name, _meta_item_name, meta| {
         let map_result = meta_get_by_symbol_keys(WITH_FUNCS, meta, &WITH_FUNCS_FIELD_PARSE_MAP)?;
         let with_funcs: WithFuncs = map_result.into();
-        if (with_funcs.declaration.is_some() && with_funcs.definitions.is_none())
-            || (with_funcs.declaration.is_none() && with_funcs.definitions.is_some())
-        {
-            return Err(syn::Error::new_spanned(
-                &meta.path,
-                format!(
-                    "both `{}` and `{}` have to be specified at the same time",
-                    DECLARATION.1, DEFINITIONS.1,
-                ),
-            ));
-        }
+        // if (with_funcs.declaration.is_some() && with_funcs.definitions.is_none())
+        //     || (with_funcs.declaration.is_none() && with_funcs.definitions.is_some())
+        // {
+        //     return Err(syn::Error::new_spanned(
+        //         &meta.path,
+        //         format!(
+        //             "both `{}` and `{}` have to be specified at the same time",
+        //             DECLARATION.1, DEFINITIONS.1,
+        //         ),
+        //     ));
+        // }
         Ok(Variants::WithFuncs(with_funcs))
     });
     m.insert(PARAMS, f_params);
     m.insert(WITH_FUNCS, f_with_funcs);
     m
 });

 /**
 Struct describes an entry like `order_param => override_type`,  e.g. `K => <K as TraitName>::Associated`
 */
diff --git a/borsh/Cargo.toml b/borsh/Cargo.toml
index e0a9465f..b208a7a6 100644
--- a/borsh/Cargo.toml
+++ b/borsh/Cargo.toml
@@ -38,19 +38,19 @@ bytes = { version = "1", optional = true }
 bson = { version = "2", optional = true }

 [dev-dependencies]
 insta = "1.29.0"

 [package.metadata.docs.rs]
 features = ["derive", "unstable__schema", "rc"]
 targets = ["x86_64-unknown-linux-gnu"]

 [features]
-default = ["std"]
+default = ["std", "unstable__schema"]
 derive = ["borsh-derive"]
 unstable__schema = ["derive", "borsh-derive/schema"]
 std = []
 # Opt into impls for Rc<T> and Arc<T>. Serializing and deserializing these types
 # does not preserve identity and may result in multiple copies of the same data.
 # Be sure that this is what you want before enabling this feature.
 rc = []
 de_strict_order = []
diff --git a/borsh/tests/schema/test_schema_with_third_party.rs b/borsh/tests/schema/test_schema_with_third_party.rs
index 30562f89..ea068e85 100644
--- a/borsh/tests/schema/test_schema_with_third_party.rs
+++ b/borsh/tests/schema/test_schema_with_third_party.rs
@@ -1,24 +1,25 @@
 use crate::common_macro::schema_imports::*;

 // use alloc::collections::BTreeMap;

 #[allow(unused)]
+#[derive(BorshSchema)]
 struct ThirdParty<K, V>(BTreeMap<K, V>);
 #[allow(unused)]
 mod third_party_impl {
     use crate::common_macro::schema_imports::*;

     pub(super) fn declaration<K: borsh::BorshSchema, V: borsh::BorshSchema>(
     ) -> borsh::schema::Declaration {
         let params = vec![<K>::declaration(), <V>::declaration()];
-        format!(r#"{}<{}>"#, "ThirdParty", params.join(", "))
+        format!(r#"{}<{}>"#, "ThirdPartyOneHasNoControlOfRenamed", params.join(", "))
     }

     pub(super) fn add_definitions_recursively<K: borsh::BorshSchema, V: borsh::BorshSchema>(
         definitions: &mut BTreeMap<borsh::schema::Declaration, borsh::schema::Definition>,
     ) {
         let fields = borsh::schema::Fields::UnnamedFields(vec![
             <BTreeMap<K, V> as borsh::BorshSchema>::declaration(),
         ]);
         let definition = borsh::schema::Definition::Struct { fields };
         let no_recursion_flag = definitions.get(&declaration::<K, V>()).is_none();
@@ -58,24 +59,24 @@ enum C<K, V> {
 pub fn struct_overriden() {
     assert_eq!(
         "A<u64, String>".to_string(),
         <A<u64, String>>::declaration()
     );
     let mut defs = Default::default();
     <A<u64, String>>::add_definitions_recursively(&mut defs);
     assert_eq!(
         schema_map! {
             "A<u64, String>" => Definition::Struct { fields: Fields::NamedFields(vec![
-                ("x".to_string(), "ThirdParty<u64, String>".to_string()),
+                ("x".to_string(), "ThirdPartyOneHasNoControlOfRenamed<u64, String>".to_string()),
                 ("y".to_string(), "u64".to_string())]
             )},
-            "ThirdParty<u64, String>" => Definition::Struct { fields: Fields::UnnamedFields(vec![
+            "ThirdPartyOneHasNoControlOfRenamed<u64, String>" => Definition::Struct { fields: Fields::UnnamedFields(vec![
                 "BTreeMap<u64, String>".to_string(),
             ]) },
             "BTreeMap<u64, String>"=> Definition::Sequence {
                 length_width: Definition::DEFAULT_LENGTH_WIDTH,
                 length_range: Definition::DEFAULT_LENGTH_RANGE,
                 elements: "(u64, String)".to_string(),
             },
             "(u64, String)" => Definition::Tuple { elements: vec!["u64".to_string(), "String".to_string()]},
             "u64" => Definition::Primitive(8),
             "String" => Definition::Sequence {
@@ -101,23 +102,23 @@ pub fn enum_overriden() {
         schema_map! {
             "C<u64, String>" => Definition::Enum {
                 tag_width: 1,
                 variants: vec![
                     (0, "C3".to_string(), "CC3".to_string()),
                     (1, "C4".to_string(), "CC4<u64, String>".to_string())
                 ]
             },
             "CC3" => Definition::Struct { fields: Fields::UnnamedFields(vec!["u64".to_string(), "u64".to_string()]) },
             "CC4<u64, String>" => Definition::Struct { fields: Fields::UnnamedFields(vec![
-                "u64".to_string(), "ThirdParty<u64, String>".to_string()
+                "u64".to_string(), "ThirdPartyOneHasNoControlOfRenamed<u64, String>".to_string()
             ]) },
-            "ThirdParty<u64, String>" => Definition::Struct { fields: Fields::UnnamedFields(vec![
+            "ThirdPartyOneHasNoControlOfRenamed<u64, String>" => Definition::Struct { fields: Fields::UnnamedFields(vec![
                 "BTreeMap<u64, String>".to_string(),
             ]) },
             "BTreeMap<u64, String>"=> Definition::Sequence {
                 length_width: Definition::DEFAULT_LENGTH_WIDTH,
                 length_range: Definition::DEFAULT_LENGTH_RANGE,
                 elements: "(u64, String)".to_string(),
             },
             "(u64, String)" => Definition::Tuple { elements: vec!["u64".to_string(), "String".to_string()]},
             "u64" => Definition::Primitive(8),
             "String" => Definition::Sequence {

this diff is vs https://github.com/near/borsh-rs/tree/f16cd07e3c982539352aa43f65abf3607461a7bc

cd borsh && cargo test  --features unstable__schema 'schema::test_schema_with_third_party'

it appears that doing a robust solution for arbitrary levels of nesting will require considerable breakage

dzmitry-lahoda commented 4 months ago

after some discussion and above comments, it seems that if i am mistaken,

  1. fixing conflicts in crates are I control is easy - rename or alias, or custom attributed. So i am not interested in fixing this as attribute (UPDATE: because there are workarounds, so error can be made more descriptive, like showing crate+mod path of conflicts? UPDATE2: and because it is not blocker for schema adoption, while inability to fix without forking 3rd parties is).
  2. fixing 3rd party crates could be solved by prefix, I am ok with that, but seems leaking details is not good. why I am good? because in the end everything is just some unique string which hints me how to debug thing, if crate+mod migrated/moved, i am ok with default behavior schema changes and TS consumer update as needed (even if change is doing nothing in the end). which leads to solution allow root object to hook and rename any 3rd party crates declaration during registration, so when assembly from root to children happens, my hook on root can receive child register and rename as needed. UPDATE: hook to receive declaration+definition+crate+mod of each registration.

For me good is hook, but ideal is crate+module prefix and hook working together (controlled by attributes). So without hook does not seem there is final solution?

dj8yfo commented 3 months ago

generates second declaration/definition under +1 suffixed name

@dzmitry-lahoda yeah, well, more-or-less robust automatic renames instead of conflict detection should be possible, if one introduces schema_id,

        impl schemars::JsonSchema for A {
            fn schema_name() -> std::string::String {
                "A".to_owned()
            }
            fn schema_id() -> std::borrow::Cow<'static, str> {
                std::borrow::Cow::Borrowed("example_crate::a::A")
            }

Someone willing to replace somebody's else's type schema intentionally will still be able to do so, but a reasonable default of std::module_path! + type_name in derives will be a great way to do a unique id for types, which wish to conform to being nice to others. Schemars doesn't attempt to compare the definitions of types semantically, which might prove to be anywhere from very hard to impossible with recursive types being present. Self-assigned schema_id would be the full source of truth for comparing types, breaking borsh's schema trait in a similar way by replacing a btreemap<declaration, definition> with a Generator, having an additional btreemap<schema_id, RenamedDeclaration> field.


a more robust solution would require compiler support for type_id of arbitrary types