serde-rs / serde

Serialization framework for Rust
https://serde.rs/
Apache License 2.0
9.05k stars 767 forks source link

serde(with) for containers #1118

Open tailhook opened 6 years ago

tailhook commented 6 years ago

The issue originated from https://github.com/serde-rs/serde/pull/1111#issuecomment-349386342 and https://github.com/serde-rs/serde/issues/979#issuecomment-350405617.

The idea is that something like this should work:

#[derive(Deserialize)]
#[serde(with="some_module")]
enum SomeEnum {
}

Implementation might work as follows: the code which is usually implemented for Deserialize is generated for a different but similar trait, let's call it DeserializeImpl. Then Deserialize is implemented using some_module::deserialize.

It might even be possible to always generate DeserializeImpl and have a blanket impl<T:DeserializeImpl> Deserialize for T. Thus allowing override Deserialize for a specific struct. But I'm not sure it's entirely backwards compatible.

Marwes commented 6 years ago

Arguably the with attribute does two separate things here. First it generates the normal impl Deserialize into DeserializeImpl and second it implements Deserialize using some_module::deserialize. Could it be worthwhile to split these two actions into two?

It might even be possible to always generate DeserializeImpl and have a blanket impl Deserialize for T. Thus allowing override Deserialize for a specific struct. But I'm not sure it's entirely backwards compatible.

This won't work since that impl prevents impl Deserialize from compiling if there exists a impl DeserializeImpl (and the whole idea behind this is to implement Deserialize by using DeserializeImpl + some extra functionality).

tailhook commented 6 years ago

This won't work since that impl prevents impl Deserialize from compiling if there exists a impl DeserializeImpl

I believe it works, but there might be some other issues I don't foresee.

Could it be worthwhile to split these two actions into two?

Any use cases for that? It's possible to make an alternative, similar to how we have serialize_with and deserialize_with, but I believe short form is more common.

dtolnay commented 6 years ago

@tailhook I like the approach of emitting a trait impl for some trait that is not Serialize/Deserialize but has the same signature. That seems more promising than emitting freestanding functions or inherent methods because it means other code like some_module can be generic over that logic.

I agree with @Marwes that the with attribute as you wrote it is doing two separable things. That can be valuable! It seems like a common use case to use it that way. Let's try to identify some use cases that only want one of those two separable pieces, and make sure they can be addressed in this approach or in a future extension of this approach.

Could someone put together a few playground links showing the complete code of how this would address:

I think this would be really valuable in making sure we are thinking about the whole design space and not just one use case.

Marwes commented 6 years ago

I believe it works, but there might be some other issues I don't foresee.

That works but you will want to call DeserializeImpl/X in Deserialize/Y so this does not work since you can only implement one of the traits https://play.rust-lang.org/?gist=2d78b6365b6dbb5df93953958491e8e9&version=stable.

mqudsi commented 6 years ago

Would this be the only way to instruct serde that type Foo should be deserialized with FromStr<Foo>::from_str on a type level?

#[serde(deserialize_with)] can be used on each instance of the type Foo, but I don't think there's a way to globally say that this type should use this deserialization method?

dtolnay commented 6 years ago

@mqudsi yes I would expect this to support deserializing using FromStr. There would need to be some external piece of logic to handle the details like converting the error types. Something like:

#[derive(Deserialize)]
#[serde(with = "serde_aux::string")]
struct Mqudsi {
    /* ... */
}

This is an interesting use case because we would not want to give this type a DeserializeImpl impl as specified in https://github.com/serde-rs/serde/issues/1118#issue-280662132. There may need to be a way to select whether a DeserializeImpl impl gets generated.

Mingun commented 3 years ago

I'm wrong or that is already implemented with #[serde(from/try_from/into = "...")]?

RReverser commented 3 years ago

@Mingun No, it's not the same. The attribute you mentioned is for implementing Deserialize via another type, while this issue is rather about augmenting Deserialize on the same type.

I still hope it will be implemented someday, as it's one of most common issues I'm running into with Serde.

Mingun commented 3 years ago

But when you augmenting Deserialize you actually doing that:

impl Deserialize for SomeType {
  fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where D: Deserializer<'de>,
  {
    // some non-standard logic, for example calling
    // some_module::deserialize(...)
  }
}

That non-standard logic can be implemented as:

// Derive or manually implemented
#[derive(Deserialize)]
struct MyNonStandardLogicHelper { ... }
impl Into<SomeType> for MyNonStandardLogicHelper {
  fn into(self) -> SomeType { ... }
}

#[derive(Deserialize)]
#[serde(from = "MyNonStandardLogicHelper")]
struct SomeType { ... }

I see no difference with both approaches, because you always can implement your some_module::deserialize as

mod some_module {
  pub fn deserialize<'de, D>(deserializer: D) -> Result<SomeType, D::Error>
    where D: Deserializer<'de>,
  {
    MyNonStandardLogicHelper::deserialize(deserializer).map(Into::into)
  }
}

So I think that author's case has been implemented

RReverser commented 3 years ago

The important difference is doing such augmentation transparently to the end type structure. Those important details are hidden under ... in your snippet.

For example, let's say you have

#[derive(Deserialize)]
struct S {
  field0: i32,
  field1: i32,
  ...
  field100: i32,
}

Now if you want to add some finalization (e.g. to check values of fields, or to modify some fields after they've been parsed), you have three options:

  1. Implement all of Deserialize manually for this structure - very verbose.
  2. Add a newtype struct FinalizedS(S); + impl From<S> for FinalizedS + + #[serde(from)] - works, but is a breaking change for the type structure, making it unsuitable for many library use-cases.
  3. Add

    #[derive(Deserialize)]
    #[serde(from = "S")]
    struct FinalizedS {
      field0: i32,
      field1: i32,
      ...
      field100: i32,
    }

    and also use from approach - works, but even more verbose than (1) because now you also need to spell out all the fields in the From implementation.

While any of those options kinda works, none is a great option to just augment deserialization without affecting the end type structure. Hence the proposal to introduce a mechanism similar to existing serde(with = ...) and serde(remote = ...) but for finalizing result instead of overriding it altogether.

Mingun commented 3 years ago

If you want to modify existing structure, for example, replacing some invalid values with valid ones, I think, that is anti-pattern and you should be happy that serde prevents you from doing that. Incorrect states should be inrepresentable in your program structure, so you couldn't create it accidently. Of course, when you read external data you haven't guaranties about their state, so you should use some other structure and convert it to your final structure during validation stage, which can be done in From/TryFrom implementation.

RReverser commented 3 years ago

Well, I don't agree, on practice it is often important. It's basically same use-cases / reasons as #[serde(default)] or #[serde(deserialize_with = ...)] on individual fields, except sometimes you need to see other fields in the structure to correctly set such defaults or perform other transformations.

raphaelcohn commented 3 years ago

@Mingun In theory, you're right. In practice, though, it's more subtle and I agree with @RReverser. What you're advocating can make for inefficient code and is impossible where a type is needed with its original signature elsewhere (eg in libraries). I've long needed either validators per-field, a validator of a final state and the ability to create derived data fields; the inner data structure of an object is different to its external representation to others. Going back to my OOP mentoring days, I'd always be advocating not to use getter / setters for this very reason.

Just today I've had to modify a struct that is mostly configuration data, but now needs a field computing after deserialization but before first use. I can't replace the struct with another, and I don't want to use a lazy-init that pushes the cost into first use - this makes it far more complex when the object should be fully init'd after construction. (I've always been a fan of immutable construction).

Serde's ongoing resistance to validators and 'finalizers' makes its user overly painful. Then again, serde is not for the performance critical path.

Lastly, @Mingun, you're point about guaranties about external data is very valid. If that's a genuine concern, then serde SHOULD NOT BE USED. It's a potential DoS vector, unless somehow one can give it a separate stack and separate malloc with say, 64Kb of memory, which doesn't panic on overflow, but returns a Result (which simply is extremely difficult in today's Rust as all collections expect to panic on memory exhaustion). I use serde extensively, but never for data to be deserialized from unknown or uncontrolled sources.

kangalio commented 1 year ago

Arguably the with attribute does two separate things here. First it generates the normal impl Deserialize into DeserializeImpl and second it implements Deserialize using some_module::deserialize. Could it be worthwhile to split these two actions into two?

@Marwes

Interesting. The latter attribute is redundant because you can just implement Deserialize manually and call some_module::deserialize yourself. So, in the end, we're left with a new attribute that just exchanges Deserialize for DeserializeImpl in the derive-macro output?

The #642 use case is then covered via

#[derive(serde::Deserialize)]
#[serde(deserialize_alternative)]
struct Foo {}

impl<'de> Deserialize<'de> for Foo {
    fn deserialize<D: Deserializer<'de>>(deserializer: D) -> Result<Self, D::Error> {
        let foo = Foo::deserialize_alternative(deserializer)?;
        /* finalizer here */
        Ok(foo)
    }
}
kangalio commented 1 year ago

But really, having an identical 2nd copy of the Deserialize trait seems like a hack. How about this approach instead:

#[derive(serde::Deserialize)]
#[serde(deserialize_with = "deserialize_foo")]
struct Foo {}

fn deserialize_foo<'de, D: Deserializer<'de>>(
    actual_impl: fn(D) -> Result<Foo, D::Error>,
    deserializer: D,
) -> Result<Foo, D::Error> {
    let foo = actual_impl(deserializer)?;
    /* finalizer here */
    Ok(foo)
}
kangalio commented 1 year ago

@jonasbb pointed out an existing serde feature that covers this issue: #[serde(remote = "Self")].

Adding #[serde(remote = "Self")] to a struct with Deserialize/Serialize derive changes the generated deserialize and serialize functions from trait implementations to inherent methods. You can then add your manual Deserialize/Serialize implementation which calls into Self::deserialize()/Self::serialize() (which dispatches to the inherent method from the derive)

fjarri commented 1 year ago

Isn't the initial post about not implementing Deserialize manually, but having it auto-generated with the use of some_module?

kangalio commented 1 year ago

Isn't the initial post about not implementing Deserialize manually, but having it auto-generated with the use of some_module?

@fjarri

No, the issue is only about not implementing the bulk deserialization logic manually. A manual Deserialization impl that just delegates to the actual auto-generated impl is ok

Implementation might work as follows: the code which is usually implemented for Deserialize is generated for a different but similar trait, let's call it DeserializeImpl. Then Deserialize is implemented using some_module::deserialize.

In jonasbb's workaround case, it's Self::deserialize instead of some_module::deserialize.

fjarri commented 1 year ago

Then Deserialize is implemented using some_module::deserialize.

I assumed that this implementation is supposed to be automatic. Otherwise why even have the attribute?

kangalio commented 1 year ago

Using the original proposed API:

#[derive(Deserialize)]
#[serde(with="some_module")]
enum SomeEnum {
    /* any fields here */
}

mod some_module {
    fn deserialize<'de, D: Deserializer<'de>>(deserializer: D) -> Result<SomeEnum, D::Error> {
         <SomeEnum as DeserializeImpl>::deserialize(deserializer)
    }
}

With the workaround:

#[derive(Deserialize)]
#[serde(remote="Self")]
enum SomeEnum {
    /* any fields here */
}

impl Deserialize for SomeEnum {
    fn deserialize<'de, D: Deserializer<'de>>(deserializer: D) -> Result<SomeEnum, D::Error> {
         Self::deserialize(deserializer)
    }
}

Otherwise why even have the attribute?

The point of this attribute is to insert custom pre-deserialization and post-deserialization code without having to reimplement the deserialization code itself. The workaround allows you to do this, too

fjarri commented 1 year ago

The difference is that in the first case some_module may be third-party, but in the second case you must write the impl yourself. Also the second case looks weird from the first glance - you derive(Deserialize), but then write an impl again, and I wouldn't call remote="Self" an intuitive API.

kangalio commented 1 year ago

That is true, the workaround doesn't have optimal API.

I just posted it here because it at least saves you from reimplementing all fields deserialization logic, which was the main reason I wanted this PR.

zmrow commented 1 year ago

@kangalioo

@jonasbb pointed out an existing serde feature that covers this issue: #[serde(remote = "Self")].

Adding #[serde(remote = "Self")] to a struct with Deserialize/Serialize derive changes the generated deserialize and serialize functions from trait implementations to inherent methods. You can then add your manual Deserialize/Serialize implementation which calls into Self::deserialize()/Self::serialize() (which dispatches to the inherent method from the derive)

This is somewhat dangerous. Unless the developer is paying close attention, elsewhere internally they might mistakenly call Self::deserialize()/Self::serialize() which avoids the manual implementation (and it's included validation) altogether.

de-vri-es commented 5 months ago

I understand the argument for flexibility by making the macro generate the same deserialize impl as usual under a different name

But I think there is also a good argument for having a #[serde(finalize = ...)] attribute instead. Namely: this attribute could work for both containers and fields, allowing for a lot of flexibility with a single (arguably simpler) syntax.

If a field has a finalizer and the type of the field also has one, they would simply both run: first the one from the type, since it's embedded in it's Deserialize impl, then the one on the field as part of the container's Deserialize impl.