serde-rs / serde

Serialization framework for Rust
https://serde.rs/
Apache License 2.0
8.95k stars 752 forks source link

Simpler deserializer example #1033

Open spease opened 6 years ago

spease commented 6 years ago

Though the current Deserializer example is very complete, it would be nice if there were a simpler example. It might also be useful to provide default implementations or helper functions for things that most formats are going to do the same thing with (eg numeric / floating point types).

It would also be good to explicitly link to or call out the summary information about the trait functions on the Deserializer trait: https://serde.rs/impl-deserializer.html

I'm in the process of writing my first deserializer now.

dtolnay commented 6 years ago

Unfortunately that Deserializer example is about as minimal as it can be. Do you have any parts in mind that you think do not contribute to understanding how a Deserializer works, and should be cut out? I would prefer to keep the example code in a state that you can copy verbatim and compile right away, and the tedious bits are already unimplemented!().

I moved your other two points to #1035 and #1036 so we avoid interleaving the discussions.

spease commented 6 years ago

Here's the format I was working with when I filed these comments. I still need to add more documentation: https://github.com/spease/labview/blob/master/src/lvm.rs

I'm not 100% convinced that serde is the most optimized approach for this, but I liked the aspect of adding metadata to the structure so that it could be serialized to other formats (eg json) relatively trivially. If I have extra time, I might take a stab at redoing the parse functions with nom as well, but I wanted it to work from a reader and my initial implementation was not line-based so I didn't have a string to work with like I do now.

Geal told me that nom on unstable has a generator API that can do this, but nom has seemed pretty opaque each time I've looked at it. I'm not a parsing guru and have historically not enjoyed it (I struggled to write an LALR parser in C) so I'm probably on the lower end of understanding of this stuff, and my parsing implementation above is probably pretty terrible.

The LVM format isn't intended for general storage of information, but it does seem to have some conventions in place...even though when you dig a bit deeper (ie make whitespace visible) you start to see what appear to be undocumented lists or errant separators. So it's tricky to work with, but it has fewer types to deal with.

I think what would be useful is starting with the bare minimum just to get things to compile - just listing all the functions out with unimplemented!. Then on the next page implementing a generic subset (eg parsing all the integers from strings). Then implementing floats. Then maps. Then structs. Then enums. That breaks the problem down into chunks that are a bit less bite-sized and overwhelming, and it would also imply how to break the problem down when writing a new format.

A real obstacle for me was just getting started - I saw a few paragraphs and then hundreds of lines of code, so it looked like writing a format was a massive investment compared to what I expected the problem to be like with C/++ (albeit without reflection).

I don't like copy-pasting big blobs of code because I won't have any idea about what it's doing, a lot of times I'll find such code has a lot of redundancy or suboptimal behavior for what I'm working on, and there's more code to troubleshoot and debug when things don't work the way I want them to.

One page I have earmarked - although this is for nom - is this because it breaks down a relatively simple example: https://fnordig.de/2015/07/16/omnomnom-parsing-iso8601-dates-using-nom/

Another thing that was unclear to me was how everything works together and what the role of each type of function is. A diagram showing Deserialize, Visitor, Deserializer, Serialize, Serializer, and perhaps other associated types would help.

Some of my confusion was in how I read the documentation. I didn't sit down and just read it. I'd used serde serialize/deserialize in the past for JSON and so I looked specifically at the parts that looked relevant to the problem I was trying to solve - adding support for a format. Thus I only stumbled across the data model stuff when I was going through the API documentation.

Last but not least, one thing that might help is adding some type constraints. Eg if something is a map, restricting it somehow to an implementation that provides MapAccess. I'm still a bit unclear on how types get broken down - eg if I use a SeqAccess with a struct instead of a MapAccess, what happens? Or if I use a visit_bool or something? It's not clear to me from the function types themselves what guarantees they provide and expect. It looks to me like there's immense flexibility to override practically anything, but it would be nice to have the guarantees enforced (by default) that the "regular" implementations of Deserialize/Serialize by most users will expect. Even if I could hypothetically write a deserialize for a Rust struct that just used visit_bool.

Alternatively, the Serde type category pages could list out what that type category expects and which visit implementation is expected to get called, etc. But I think having some enforcement of those constraints at compile-time instead of everything providing a visitor would be nice - unfortunately I don't know of a way to flag at compile-time "I know what I'm doing, take off the training wheels".

dtolnay commented 6 years ago

Wow, thanks for the great feedback! It is clear there is a lot we need to improve.

One thing to be careful of is I've seen sometimes the more documentation you add, the less of it people read. That may already have been a factor in your experience, so do you have a sense of where we stand on this continuum? Ideally I would like to explore some ways to improve cross-referencing of our documentation to make it clear what parts are important to understand before reading a particular part, without increasing the aggregate amount of documentation.

spease commented 6 years ago

I feel like a diagram is probably the most efficient way to show the deserialize/visit/deserializer process, how the functions are expected to call each other and how the structs / traits relate to each other.

It's hard for me to compare documentation to other similar projects. Compared to the amount of interface and the complexity of the interface, I feel like the reference documentation for deserializer is lacking (the known hole being things like which *Access, if any, trait is appropraite in a given deserialize_ function). The quick start - although I'm hesitant to call the deserializer example in the main documentation a quick start - seems too long.

I feel like there are at least three different stages of using a library. The first is whether you're going to use it. Even if you make the assumption that serde should be the serialization/deserialization library that you use if you're working with Rust, maybe there are cases where it isn't clear that you should be using any library at all. Maybe it needs to be really optimized or you're on a time budget. In this stage you just want to include the library and see it do something and get your figurative foot in the door. Something useful would be nice, but something representative of what the library is would be most important.

In Serde's case, I'm guessing this would be something like deserialization of a struct with strings only. Maybe just a newtype struct. Stage 1 just introduces the thing.

Stage 2 is starting to do something useful, which would be where example-templates would come in. Say something like what is on the website now. Maybe a self-describing text format, non-self-describing text format, and non-self-describing binary format. A lot of times I'll see this sort of documentation as a blog post or even a stackoverflow post. Stage 2 gives you the context.

Stage 3 would be potentially a part of stage 2, but at the point where you're implementing the actual functions for real for your specific thing and trying to make them full quality. At that point you're probably drilling into the specific API parts that you need to focus on, having already chosen a template or started to build something from scratch. Stage 3 gives you the details and assumes you already have the context.

A lot of times if I can and have some understanding of the programming construct in question, I'll try to start with step 1 and then build things up with step 3. However I think a lot of people favor step 1 and 2 and don't worry to much about step 3 unless they really have to. And that's sort of the approach I'll fall back to if I'm in a hurry, or don't understand the thing.

In terms of placement, I would put stage 1 as prominently as possible or just under "Quick Start" as a heading. Stage 2 where the existing documentation is now, in some sort of manual or website. And stage 3 inline in the documentation where you have to look for it. One thing I see a lot of Rust libraries doing (serde included) is putting some kind of quick start or examples into the front page of the documentation. Eg ssh2-rs: http://alexcrichton.com/ssh2-rs/ssh2/index.html

The other thing I'd add somewhere is a list of libraries and formats using serde and whether they implement serialize, deserialize, ints, floats, bools, strings, tuples, maps, structs, and reading and writing speeds. The last could be controversial, but ideally would be some consistent data that gets benchmarked with a CI and provided as a factor. But the benchmarking could be controversial, so could be left to some other project.

That is all just my $0.02, and I'm not sure I'm directly answering your question.

spease commented 6 years ago

Also, this is a very "deserializer" focused view. I'm assuming that you would keep the "Serde In Action" stuff as is, I'm not sure that the deserializer stuff is 100% important enough for somebody completely new to serde to find it, but it should be easily findable if you dig one level deeper.

In my case, having some small amount of code working can hook me because then I'll want to finish what I've started. And if I have any immediate concerns about performance etc I can mutate the simple example to address those up front. And in a work setting, it's probably seen more positively to have anything working in a very short span of time than to be working for hours with no working code to show for it and still fighting compiler errors.

eedahl commented 6 years ago

I found this issue trying to figure out how to get a CSV of times written as "00:00:00" into i32s wrapped as a Time(i32) struct correctly, since it appears it's trying to naïvely cast the strings to ints and I found the documentation here decidedly maximal :/

spease commented 6 years ago

It sounds like you may want to use version 1.0.0-beta5 of the CSV crate: https://crates.io/crates/csv

And then implement deserialize (not deserializer) for the Time structure. But maybe I’m misunderstanding what you’re doing.

Sent from my iPhone

On May 10, 2018, at 11:41, eedahl notifications@github.com wrote:

I found this issue trying to figure out how to get a CSV of times written as "00:00:00" into i32s wrapped as a Time(i32) struct correctly, since it appears it's trying to naïvely cast the strings to ints and I found the documentation here decidedly maximal :/

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

eedahl commented 6 years ago

That is actually what I'm doing,

#[derive(Deserialize)]
#[serde(remote = "Time")]
struct TimeDef(i32);

#[derive(Debug, Deserialize)]
pub struct Targets {
    #[serde(with = "TimeDef")]
    godlike: Time,
    #[serde(with = "TimeDef")]
    legendary: Time,
    #[serde(with = "TimeDef")]
    world_class: Time,
    #[serde(with = "TimeDef")]
    professional: Time,
    #[serde(with = "TimeDef")]
    good: Time,
    #[serde(with = "TimeDef")]
    ok: Time,
    #[serde(with = "TimeDef")]
    beginner: Time,
}

I want to reduce

    let mut r = csv::Reader::from_path("targets.csv").expect("Could not read file: targets.csv");
    r.records()
        .map(|r| r.unwrap())
        .map(|row| Targets {
            godlike: Time::from(&row[0]),
            legendary: Time::from(&row[1]),
            world_class: Time::from(&row[2]),
            professional: Time::from(&row[3]),
            good: Time::from(&row[4]),
            ok: Time::from(&row[5]),
            beginner: Time::from(&row[6]),
        })
        .collect()

into

    let mut r = csv::Reader::from_path("targets.csv").expect("Could not read file: targets.csv");
    r.deserialize()
        .map(|r| r.unwrap())
        .collect()

but the program panics at unwrap, since it seems to be trying to simply cast "00:00:00" to an integer. I may be lost in the woods here, but this seems like a simple use case for deserialization in serde, but looking at the documentation I'm afraid I'm quite lost.

EDIT: Ah I see that this issue is about the Deserializer doc not Deserialize. I'm still confused I'm afraid, but it's probably better for somewhere else :)

spease commented 6 years ago

Yes. #[derive(Deserialize)] will derive the default deserialize for the wrapped type (i32). You need to implement your own deserialize trait for time. See “implementing a custom deserialize” in the serve documentation.

It would probably also be cleaner to wrap the original Time type in a struct (like you’re doing with the i32), implement the custom deserialize for the wrapper type, and then use shrinkwrapr to make the wrapper transparently evaluate to Time in other code. That way you don’t have to use serde(with) everywhere.

Sent from my iPhone

On May 10, 2018, at 13:27, eedahl notifications@github.com wrote:

That is actually what I'm doing,

[derive(Deserialize)]

[serde(remote = "Time")]

struct TimeDef(i32);

[derive(Debug, Deserialize)]

pub struct Targets {

[serde(with = "TimeDef")]

godlike: Time,
#[serde(with = "TimeDef")]
legendary: Time,
#[serde(with = "TimeDef")]
world_class: Time,
#[serde(with = "TimeDef")]
professional: Time,
#[serde(with = "TimeDef")]
good: Time,
#[serde(with = "TimeDef")]
ok: Time,
#[serde(with = "TimeDef")]
beginner: Time,

} I want to reduce

r.records()
    .map(|r| r.unwrap())
    .map(|row| Targets {
        godlike: Time::from(&row[0]),
        legendary: Time::from(&row[1]),
        world_class: Time::from(&row[2]),
        professional: Time::from(&row[3]),
        good: Time::from(&row[4]),
        ok: Time::from(&row[5]),
        beginner: Time::from(&row[6]),
    })
    .collect()

into

r.deserialize()
    .map(|r| r.unwrap())
    .collect()

but the program panics at unwrap, since it seems to be trying to simply cast "00:00:00" to an integer. I may be lost in the woods here, but this seems like a simple use case for deserialization in serde, but looking at the documentation I'm afraid I'm quite lost.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

eedahl commented 6 years ago

That looks very complicated. When wanting to do this I expected it would be something along the lines of providing a function pointer for the String -> Time conversion. Thanks for pointing me in the right direction though :)

spease commented 6 years ago
extern crate serde;
#[macro_use]
extern crate serde_derive;
#[macro_use]
extern crate shrinkwraprs;

use serde::{Deserialize, Deserializer};

#[derive(Debug)]
struct Time(u64);

#[derive(Debug, Shrinkwrap)]
struct TimeDef(Time);

impl<'de> Deserialize<'de> for TimeDef {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where D: Deserializer<'de> {
        use serde::de::Error;

        // Deserialize the string and get individual components
        let s = String::deserialize(deserializer)?;
        let mut times = s.split(":");

        // Parse individual components
        let hours = times.next()
            .ok_or(D::Error::custom("Missing hours"))?
            .parse::<u64>()
            .map_err(D::Error::custom)?;
        let minutes = times.next()
            .ok_or(D::Error::custom("Missing minutes"))?
            .parse::<u64>()
            .map_err(D::Error::custom)?;
        let seconds = times.next()
            .ok_or(D::Error::custom("Missing seconds"))?
            .parse::<u64>()
            .map_err(D::Error::custom)?;

        // Make sure there's nothing extra left
        if times.next() != None {
            Err(D::Error::custom("Unexpected element"))?
        }

        // Convert to int and wrap in time
        Ok(TimeDef(Time(hours*60*60 + minutes*60 + seconds)))
    }
}

#[derive(Debug, Deserialize)]
pub struct Targets {
    godlike: TimeDef,
    legendary: TimeDef,
    world_class: TimeDef,
    professional: TimeDef,
    good: TimeDef,
    ok: TimeDef,
    beginner: TimeDef,
}
eedahl commented 6 years ago

Wow, thanks a lot. I may just be extraordinarily inexperienced starting with Rust less than a week ago, but this is the sort of minimal non-standard example I'd like to find here and can imagine other people look for as well. It would perhaps be a good starting point for showing when it's not enough and you might have to use a visitor too, but I don't know.