serde-rs / serde

Serialization framework for Rust
https://serde.rs/
Apache License 2.0
9.17k stars 774 forks source link

Review the bounds generation heuristic? #2463

Open fjarri opened 1 year ago

fjarri commented 1 year ago

So, like other people, I have encountered a problem with automatic derivation of Serialize/Deserialize in a nested generic hierarchy of types, which requires me to write ugly explicit bounds. The heuristic is described in serde_derive::bound::with_bound(), and says

// For example, the following struct needs the bound `A: Serialize, B:
// Serialize`.
//
//     struct S<'b, A, B: 'b, C> {
//         a: A,
//         b: Option<&'b B>
//         #[serde(skip_serializing)]
//         c: C,
//     }

where obviously the struct needs to be bound instead A: Serialize, Option<&'b B>: Serialize.

If I'm not mistaken, the reasons behind this heuristic are listed in this comment. I tested this on Rust 1.69.

Private in public. This compiles:

use core::marker::PhantomData;

pub trait A {}

struct B<T> { x: PhantomData<T> }

impl<T> A for B<T> {}

pub struct C<T> {
    x: B<T>
}

// would be generated
impl<T> A for C<T> where B<T>: A {}

Overflow evaluating requirements. This compiles:

use core::marker::PhantomData;

pub trait A {}

impl<T> A for Box<T> where T: A {}

impl<T> A for Option<T> where T: A {}

pub struct B<T> {
    t: PhantomData<T>,
    b: Option<Box<C<T>>>,
}

// would be generated
impl<T> A for B<T> where Option<Box<C<T>>>: A {}

pub struct C<T> {
    t: PhantomData<T>,
    a: Option<Box<B<T>>>,
}

// would be generated
impl<T> A for C<T> where Option<Box<B<T>>>: A {}

Lifetime deduplication inference is broken. Indeed, this does not compile:

impl<'a, T> A for &'a T {}

struct S<'a, 'b, T: 'a + 'b> {
    a: &'a T,
    b: &'b T,
}

// would be generated
impl<'a, 'b, T: 'a + 'b> A for S<'a, 'b, T> where &'a T: A, &'b T: A {}

But it does compile for this bound:

impl<'a, 'b, T: 'a + 'b> A for S<'a, 'b, T> where &'a T: A {}

So, unless I am missing something, adding bounds for every field type instead of every generic parameter would work fine in the first two cases, and in the third one a deduplication may be required (which I think can be performed automatically for most cases - if we can implement a comparison between types that considers two types that only differ in the lifetime labels equal). So switching to the heuristic where bounds are added for every field type would reduce the amount of times one has to override those bounds. Thoughts?

dtolnay commented 1 year ago

Thank you for calling this out! I am excited to learn that "Private in public" and "Overflow evaluating requirements" are fixed in rustc.

While "Lifetime deduplication inference is broken" remains broken, I doubt that we could reasonably change the heuristic used by serde_derive. As you noted, sometimes it can be possible to deduplicate some bounds in a purely syntactic way, but I think the same rustc bug can arise for bounds that are behind supertrait bounds not visible in the macro input, or alternatively just look different in syntax despite being identical semantically.

Could you figure out whether there is an issue about that remaining broken case upstream?

fjarri commented 1 year ago

As you noted, sometimes it can be possible to deduplicate some bounds in a purely syntactic way, but I think the same rustc bug can arise for bounds that are behind supertrait bounds not visible in the macro input, or alternatively just look different in syntax despite being identical semantically.

That's true, but I feel like in most cases the simple syntactical deduplication will work well enough.

Could you figure out whether there is an issue about that remaining broken case upstream?

Honestly, I wouldn't even know where to start (tried searching for "lifetime deduplication" just in case, but it doesn't seem to be a widely used term). I guess I'll just make one?