`Document` type requires multiple serialization / deserialization for usage with `serde_json::Value`-based APIs

andrewbanchich commented 1 year ago

Because Document uses HashMap for objects instead of serde_json::Map, there needs to be a second pass at converting that HashMap to a serde_json::Map when using libraries which work with serde_json::Values, and then another for turning it back to a HashMap / Document for use in Smithy output types.

serde_json::Value is the standard for the Rust ecosystem, so I expect a lot of crates will need to do this.

andrewbanchich commented 1 year ago

It would be really nice if Smithy codegen could use T: Serialize for input types and T: Deserialize for output types instead of concrete values so that users could choose.

jdisanti commented 1 year ago

@thomas-k-cameron has done a lot of work to add Serialize/Deserialize support, and there are a number of PRs open to add it to other types in aws-smithy-types (#2645, #2646, #2647).

The same pattern can be followed to also add them to Document. It just needs to be behind #[cfg_attr(aws-sdk-unstable, feature = "serialize")] or #[cfg_attr(aws-sdk-unstable, feature = "deserialize")] conditional compilation flags for now.

We won't get to this any time soon, but you're welcome to contribute and work with @thomas-k-cameron to get this done.

andrewbanchich commented 1 year ago

To clarify, this is more about the Document type using a HashMap instead of a serde_json::Map. Even if Document did impl Serialize / Deserialize, many users would still need to converted a Document to a Value and back. This conversion is trivial for other variants of documents because their types line up with the same inner types of serde_json::Values, so one can just unwrap them from Document and wrap them in Value. Document::Object wouldn't be able to do that though.

thomas-k-cameron commented 1 year ago

serde_json::Map uses BTreeMap or IndexMap and this is selected by conditional compilation and they used to use other implementation (e.g. LinkedHashMap).

Since they are not making any guarantee, I don't know if it would be a good idea.

thomas-k-cameron commented 1 year ago

Once RFC30 is implemented you should be able to do something like this.

    let doc = aws_smithy_types::Document::Object(HashMap::new());
    let serde_json_value = serde_json::to_value(doc).unwrap();
    let target_datatype = serde_json::from_value(serde_json_value).unwrap();

If your concern is for the UX then I think we could add a function like this.

impl Document {
   pub fn to_serde_json(&self) -> serde_json::Value {
       // convert to Value
   }
}

andrewbanchich commented 1 year ago

My concern isn't about implementing Serialize or Deserialize, and it isn't the API for doing so.

I'm saying that 99.9% of the Rust ecosystem uses serde_json::Value for representing JSON, not Smithy's Document type. For inputs, users will have API handlers which accept a Document and output a Document, but in between the input and output, basically everyone will need to convert this type manually to serde_json::Value.

Converting between map types is not trivial. Users will need to pay a performance cost for arbitrarily large map conversions.

serde_json is flexible enough to allow the user to choose what type they want to deserialize to; if I want a HashMap then I can have it deserialize to a HashMap. If I want a serde_json::Map, that works too.

I'm saying IMO forcing one map type onto all users is not ideal. It would be nice to allow users to specify the type like serde_json does.

I guess my concern also applies to Document::Array - is Smithy going to make users manually map over arrays and call some custom convert_to_x function?

Other than Array and Document, the other variant types are all standard types, so can easily be used throughout the ecosystem. However, Vec and HashMap both use generics, which is why I believe they are problematic, and that you should pass the control of the concrete type through to the user.

jdisanti commented 1 year ago

I understand the concern, but we don't want to depend on any serde libraries out of box. I could be persuaded to represent Document with serde_json::Value behind the scenes #[cfg_attr(aws-sdk-unstable)] switch though, and also add to/from value conversions.

Our concern is with maintainability. The SDK will have to be maintained without breaking changes for many years, so exposing separate libraries in the public API is a risk.

andrewbanchich commented 1 year ago

aws-sdk-unstable sounds great to me! Thank you.

Any thoughts on handling this with a generic and allowing the user to specify the concrete type? I'd think that would be the best of both worlds.

thomas-k-cameron commented 1 year ago

Every data type that uses Document would need to accept generic parameters, so I'm pretty sure you would need to make a lot of changes to the code gen logic.

Also, I'm not quite sure how maintain-able it is to introduce a new interface that needs to be implemented.

I think it's more simpler/practical to do what the serde_json is doing.

andrewbanchich commented 1 year ago

Every data type that uses Document would need to accept generic parameters

Why? I'm suggesting to do what serde_json does here: https://docs.rs/serde_json/latest/serde_json/fn.from_reader.html

This would give users to choice to use Document or Value or HashMap or anything else.

thomas-k-cameron commented 1 year ago

I thought you meant something like this.

enum Document<T: GenericMap = HashMap<String, Document>>

I don't think I understand you correctly, would you mind giving me an example?

andrewbanchich commented 1 year ago

That would be nice to have a default generic parameter to allow users to choose the kind of map, but when I said T: Deserialize I meant that in place of Document altogether.

There are so many parallels to serde_json::Value here, I'm wondering why we couldn't allow users to choose the entire type themselves without a need for a wrapper enum like Document or Value.

If I use a Smithy structure, I get a Smithy type which is a struct. If I use a Smithy enum, I get a Smithy type.

If I choose String or Number or bool, however, I get a standard Rust type.

I'm wondering if a Smithy model Document type could be more of a generic itself than a concrete type, and deserialize to a type I have defined in my own crate rather than one Smithy defined.

This would allow users to choose the specific map implementation used and have the benefit of not being subject to the orphan rule.

E.g. the Smithy model type is generic T (fake keyword I just invented). The codegen for Rust would produce T: Deserialize if used in an input type in the model.

The route handler for the server, for example, would accept a function which accepts an argument of any type which implements Deserialize.

Then, as a user, I can write my handler fn handler(input: Whatever) -> Result<...> and it will be deserialize to a struct Whatever { .. }.

andrewbanchich commented 1 year ago

To phrase it slightly differently, the point of serde_json is to provide polymorphism over anything which can be serialized into or deserialized from JSON.

The Value enum is only one form of this polymorphism. Generics are another, more powerful and flexible form.

If Smithy is providing the enum, could it also provide the generic?

andrewbanchich commented 1 year ago

This would be fixed with #2858 , which is what I was asking for and more 😄

david-perez commented 1 year ago

@andrewbanchich Why does #2858 solve this? In #2858, server operation handlers would take in types that implement FromRequest, but FromRequest should only be implemented by the generated SDK on the Rust struct corresponding to the operation input.

andrewbanchich commented 1 year ago

I had spoken with @hlbarber about this and it sounded like we'd be able to impl FromRequest for our custom types as well. Maybe I misunderstood.

david-perez commented 1 year ago

I don't think we should let users implement FromRequest for their own types. In fact, I think we should doc-hide and seal the trait.

If we allow users to implement FromRequest, and #2858 lands, users could easily violate their Smithy model.

andrewbanchich commented 1 year ago

I think exposing this trait would be very beneficial.

Smithy cannot serve as a source of truth for all validation rules for inputs, so currently our validation is split between our Smithy-generated types and our custom server code, with the server code performing the validation which Smithy cannot express. Because of this, the code we work with has no single source of truth for validation, which is not ideal. I would argue that the inverse is already true: Smithy allows applications to violate their "true" set of validation rules since some cannot be expressed by Smithy.

In this context, I think it would be better to have a single source of truth defined by application code than to have two sources of truth.

Exposing the trait would also mean we can avoid having a ton of code for converting between types, as well as running into the orphan rule when trying to add abstractions over Smithy types which don't exist.

smithy-lang / smithy-rs

`Document` type requires multiple serialization / deserialization for usage with `serde_json::Value`-based APIs #2702