jmcnamara / rust_xlsxwriter

A Rust library for creating Excel XLSX files.
https://crates.io/crates/rust_xlsxwriter
Apache License 2.0
250 stars 23 forks source link

WIP and RFC: Initial support for Serde serialization #61

Closed jmcnamara closed 7 months ago

jmcnamara commented 7 months ago

I've added initial support for serialization of Serde structures to rust_xlsxwriter main under the serde feature flag.

The support works like this:

  1. Prepare a standard Serde #[derive(Serialize)] struct.
  2. Serialize the header to a location in the worksheet. The fields of the struct will become headers.
  3. Call worksheet.serialize() repeatedly to write data, without having to specify the row/col position.

For example:

use rust_xlsxwriter::{Format, Workbook, XlsxError};
use serde::Serialize;

fn main() -> Result<(), XlsxError> {
    let mut workbook = Workbook::new();

    // Add a worksheet to the workbook.
    let worksheet = workbook.add_worksheet();

    // Add a simple format for the headers.
    let format = Format::new().set_bold();

    // Create a serializable test struct.
    #[derive(Serialize)]
    #[serde(rename_all = "PascalCase")]
    struct Produce {
        fruit: &'static str,
        cost: f64,
    }

    // Create some data instances.
    let item1 = Produce {
        fruit: "Peach",
        cost: 1.05,
    };
    let item2 = Produce {
        fruit: "Plum",
        cost: 0.15,
    };
    let item3 = Produce {
        fruit: "Pear",
        cost: 0.75,
    };

    // Set up the start location and headers of the data to be serialized using
    // any temporary or valid instance.
    worksheet.serialize_headers_with_format(0, 0, &item1, &format)?;

    // Serialize the data.
    worksheet.serialize(&item1)?;
    worksheet.serialize(&item2)?;
    worksheet.serialize(&item3)?;

    // Save the file.
    workbook.save("serialize.xlsx")?;

    Ok(())
}

Which would give this output: worksheet_serialize

The output can be positioned anywhere in the worksheet. For example if we change this line in the previous example:

    worksheet.serialize_headers_with_format(1, 3, &item1, &format)?;

We get this output:

screenshot

It will also serialize vectors in the struct:

    // Create a serializable test struct with vector fields.
    #[derive(Serialize)]
    #[serde(rename_all = "PascalCase")]
    struct Produce {
        fruit: Vec<&'static str>,
        cost: Vec<f64>,
    }

    // Create some data instances.
    let item = Produce {
        fruit: vec!["Peach", "Plum", "Pear"],
        cost: vec![1.05, 0.15, 0.75],
    };

    // Set up the start location and headers of the data to be serialized using
    // any temporary or valid instance.
    worksheet.serialize_headers_with_format(0, 0, &item, &format)?;

    // Serialize the data.
    worksheet.serialize(&item)?;

This gives the same output as the first example.

From the docs:

This method can be used, with some limitations, to serialize (i.e., convert automatically) structs that are serializable by Serde into cells on a worksheet.

The limitations are that the primary data type to be serialized must be a struct and its fields must be either primitive types (strings, chars, numbers, booleans) or vector/array types. Compound types such as enums, tuples or maps aren't supported. The reason for this is that the output data must fit in the 2D cell format of an Excel worksheet. (Note: this limitation is open for debate if it makes sense, see below.)

In order to serialize an instance of a data structure you must first define the fields/headers and worksheet location that the serialization will refer to. You can do this with the Worksheet::serialize_headers() or Worksheet::serialize_headers_with_format() methods. Any subsequent call to serialize() will write the serialized data below the headers and below any previously serialized data.

I am looking for feedback on the workability of this technique for any serialization use case that people may have. In particular I'd appreciate feedback from trying to make it work with existing serialize structures (within reason of what could be applied to a worksheet). Please leave comments below.

The code is on main. You will need to enable it in a local project with the following or similar:

cargo add --git https://github.com/jmcnamara/rust_xlsxwriter.git -F serde

There is some ongoing work to add support for ExcelDateTime and Chrono date/times. I will also be adding a method for adding formatting to each field value, and options to ignore fields apart from Serde #[serde(skip_serializing)], and also to reorder the fields. I may also change the error handling to just ignore unknown structs/fields.

Xydez commented 7 months ago

Wouldn't it be nicer to have serialize_vec(&items) where items is a Vec<Produce>? That is, instead of having the vecs inside the struct.

jmcnamara commented 7 months ago

Wouldn't it be nicer to have serialize_vec(&items) where items is a Vec<Produce>? That is, instead of having the vecs inside the struct.

Thanks for the input. That is probably a common use case in JSON style responses.

The good news is that this would work with the current API once the serialize_headers() step is completed.

For example:

use rust_xlsxwriter::{Format, Workbook, XlsxError};
use serde::Serialize;

fn main() -> Result<(), XlsxError> {
    let mut workbook = Workbook::new();

    // Add a worksheet to the workbook.
    let worksheet = workbook.add_worksheet();

    // Add a simple format for the headers.
    let format = Format::new().set_bold();

    // Create a serializable test struct.
    #[derive(Serialize)]
    #[serde(rename_all = "PascalCase")]
    struct Produce {
        fruit: &'static str,
        cost: f64,
    }

    // Create some data instances.
    let items = vec![
        Produce {
            fruit: "Peach",
            cost: 1.05,
        },
        Produce {
            fruit: "Plum",
            cost: 0.15,
        },
        Produce {
            fruit: "Pear",
            cost: 0.75,
        },
    ];

    // Set up the start location and headers of the data to be serialized.
    worksheet.serialize_headers_with_format(0, 0, &items.get(0).unwrap(), &format)?;

    // Serialize the data.
    worksheet.serialize(&items)?;

    // Save the file.
    workbook.save("serialize.xlsx")?;

    Ok(())
}

This would also work for a tuple of structs:

    // Create some data instances.
    let items = (
        Produce {
            fruit: "Peach",
            cost: 1.05,
        },
        Produce {
            fruit: "Plum",
            cost: 0.15,
        },
        Produce {
            fruit: "Pear",
            cost: 0.75,
        },
    );

    // Set up the start location and headers of the data to be serialized.
    worksheet.serialize_headers_with_format(0, 0, &items.1, &format)?;

    // Serialize the data.
    worksheet.serialize(&items)?;

Both examples produces the same output as the first example above. Would that work for you?

Mingun commented 7 months ago

Because serialize_headers does not require temporary item for its work, I think, it is better to provide only the type:

worksheet.serialize_headers::<Produce>(0, 0)?;
worksheet.serialize_headers_with_format::<Produce>(0, 0, &format)?;
jmcnamara commented 7 months ago

I think, it is better to provide only the type:

@Mingun That would be nice.

However, I think an instance is required, and not just the type, since it needs to be serialized to find the fields to convert them to worksheet headers. Or am I missing something?

For reference the code of the method is here: https://github.com/jmcnamara/rust_xlsxwriter/blob/1844dfea6645383d55462f387d20232f6ea263ab/src/serializer.rs#L404-L425

jmcnamara commented 7 months ago

I have added support for Serde serialization in v0.57.0. See Working with Serde in the rust_xlsxwriter docs.

Some additional serialisation features and helpers will be added in upcoming releases.