jmcnamara / rust_xlsxwriter

A Rust library for creating Excel XLSX files.
https://crates.io/crates/rust_xlsxwriter
Apache License 2.0
250 stars 23 forks source link

feature request: methods `Worksheet::serialize_headers*` should get headers directly from the struct / enum type instead of from an instance #63

Closed lucatrv closed 6 months ago

lucatrv commented 7 months ago

Feature Request

In almost all practical Worksheet::serialize use cases that I can think of, instances of the defined struct / enum are going to be created within a loop. It is therefore inconvenient to have to treat the first instance differently from all the others, for the only purpose to be able to serialize headers with one of the Worksheet::serialize_headers* methods. A better approach would be to rely on the serde_aux::serde_introspection::serde_introspect method to get the struct field names directly from the defined type. For this to work rust_xlsxwriter would need to depend on the serde-aux crate when the serde feature is enabled.

jmcnamara commented 7 months ago

It is therefore inconvenient to have to treat the first instance differently from all the others, for the only purpose to be able to serialize headers

I agree with this and I did spend a good bit of time looking at alternatives. In particular I wanted to do something like Mingun suggested in the RFC thread (#61) where the type could be specified like this:

worksheet.serialize_headers::<Produce>(0, 0)?;
worksheet.serialize_headers_with_format::<Produce>(0, 0, &format)?;

However, that would only work for Deserialize traits since ser::serialize() needs an actual instance and not just the type. serde_aux uses the Deserialize approach. I considered implementing something similar but that would require rust_xlsxwriter serializable structs to derive both Deserialize and Serialize which seemed like it would be counterintuitive to the end user so in the end I didn't do that.

That may actually be moot in practice since users will probably be serializing structs that they deserialized from somewhere else and be using #[derive(Deserialize, Serialize)] anyway. So maybe I should revisit that.

For my own usage I have been a temp default instance of the struct that I want to serialize (which is a small bit clunky but workable):

use calamine::{open_workbook, RangeDeserializerBuilder, Reader, Xlsx};
use rust_xlsxwriter::Workbook;
use serde::{Deserialize, Serialize};

#[derive(Deserialize, Serialize, Default)]
#[allow(dead_code)]
struct ExcelRow {
    region: String,
    value: f64,
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut excel: Xlsx<_> = open_workbook("Book1.xlsx")?;
    let range = excel.worksheet_range("Sheet1")?;

    let iter = RangeDeserializerBuilder::new().from_range::<_, ExcelRow>(&range)?;

    // Create a new Excel file object.
    let mut workbook = Workbook::new();
    let worksheet = workbook.add_worksheet();

    // Set up the serialization headers.
    let excel_row = ExcelRow::default();
    worksheet.serialize_headers(0, 0, &excel_row)?;

    // Serialize the data.
    for result in iter {
        let row = result?;
        worksheet.serialize(&row)?;
    }

    // Save the file to disk.
    workbook.save("Book2.xlsx")?;

    Ok(())
}
jmcnamara commented 7 months ago

@claudiofsr Any thoughts/comments on requiring a Deserialize trait for rust_xlsxwriter serializable structs? I presume it would work for your use case.

lucatrv commented 7 months ago

Yes I also used the default approach but I confirm it is clunky, IMHO the Deserialize trait will be derived / implemented anyway when Worksheet::serialize is used. Otherwise, if you want to cover all possible cases, you could provide both approaches, the first one applicable when the Deserialize trait is available, the second one when it is not available:

worksheet.serialize_headers::<Produce>(0, 0)?;
worksheet.serialize_headers_from_item(0, 0, &item1)?;
jmcnamara commented 7 months ago

Otherwise, if you want to cover all possible cases, you could provide both approaches

Good suggestion. I'll look into that.

lucatrv commented 7 months ago

Thanks, I think this will benefit 99% of use cases. To add my 2 cents, the new method names could be either serialize_headers_from_item, serialize_headers_from_instance, serialize_headers_without_deserialize_trait, etc.

jmcnamara commented 7 months ago

I've pushed an initial version of this to main.

The WIP method names are:

worksheet.serialize_headers_from_type::<Produce>(0, 0)?;
worksheet.serialize_headers_with_format_from_type::<Produce>(0, 0, &format)?;

Here is a working example:

use rust_xlsxwriter::{Format, FormatBorder, Workbook, XlsxError};
use serde::{Deserialize, Serialize};

fn main() -> Result<(), XlsxError> {
    let mut workbook = Workbook::new();

    // Add a worksheet to the workbook.
    let worksheet = workbook.add_worksheet();

    // Set a header format.
    let header_format = Format::new()
        .set_bold()
        .set_border(FormatBorder::Thin)
        .set_background_color("C6EFCE");

    // Create a deserializable/serializable test struct.
    #[derive(Deserialize, Serialize)]
    struct Produce {
        fruit: &'static str,
        cost: f64,
    }

    // Create some data instances.
    let item1 = Produce {
        fruit: "Peach",
        cost: 1.05,
    };

    let item2 = Produce {
        fruit: "Plum",
        cost: 0.15,
    };

    let item3 = Produce {
        fruit: "Pear",
        cost: 0.75,
    };

    // Set the serialization location and headers.
    worksheet.serialize_headers_with_format_from_type::<Produce>(1, 1, &header_format)?;

    // Serialize the data.
    worksheet.serialize(&item1)?;
    worksheet.serialize(&item2)?;
    worksheet.serialize(&item3)?;

    // Save the file.
    workbook.save("serialize.xlsx")?;

    Ok(())
}

Output:

screenshot

I'll finalize the names and functionality over the next few days.

claudiofsr commented 7 months ago

I changed worksheet.serialize_headers(0, 0, &T::default())?; by worksheet.serialize_headers_from_type::<T>(0, 0)?; and it worked fine.

claudiofsr commented 7 months ago

Another suggestion would be to choose to format the data structure like this:

use serde::{Serialize, Deserialize};

  // Create a deserializable/serializable test struct.
  #[derive(Deserialize, Serialize)]
  struct Produce {
      #[serde(rename = "Fruit FooBar")]
      #[rust_xlsxwriter(set_min_width=8)]
      fruit: &'static str,
      #[serde(rename = "Value")]
      #[rust_xlsxwriter(set_num_format="#,##0.00")] // 2 digits after the decimal point
      cost: f64,
      #[serde(rename = "Date DMY")]
      #[rust_xlsxwriter(set_num_format="dd/mm/yyyy")]
      dmy: Option<NaiveDate>,
      #[serde(rename = "Date MDY",)]
      #[rust_xlsxwriter(set_num_format="mm/dd/yyyy")]
      mdy: NaiveDate,
      #[serde(rename = "Long Description")]
      #[rust_xlsxwriter(set_min_width=8, set_max_width=60)]
      long_description: String
  }

Or

use serde::{Serialize, Deserialize};

  // Create a deserializable/serializable test struct.
  #[derive(Deserialize, Serialize)]
  #[rust_xlsxwriter(set_min_width=8, set_max_width=60, set_date_format="dd/mm/yyyy")]
  struct Produce {
      #[serde(rename = "Fruit FooBar")]
      fruit: &'static str,
      #[serde(rename = "Value A")]
      #[rust_xlsxwriter(set_num_format="#,##0.00")] // 2 digits after the decimal point
      cost_a: f64,
      #[serde(rename = "Value B")]
      #[rust_xlsxwriter(set_num_format="#,##0.0000")] // 4 digits after the decimal point
      cost_b: f64,
      #[serde(rename = "Date DMY A")]
      dmy_a: Option<NaiveDate>,
      #[serde(rename = "Date DMY B",)]
      dmy_b: NaiveDate,
      #[serde(rename = "Long Description")]
      #[rust_xlsxwriter(set_background_color="C6EFCE")]
      long_description: String
  }

Or something else.

This way, we can specify the format of each column/field individually or globally and also choose the maximum and minimum widths to improve auto_fit().

The specific formatting of each column/field overrides the global formatting.

Just suggestions for implementations ("brainstorm")! I don't know if these implementations would be viable.

I am currently formatting each column using Regex and serde_aux: read_xml / excel

lucatrv commented 7 months ago

Another suggestion would be to choose to format the data structure like this:

use serde::{Serialize, Deserialize};

  // Create a deserializable/serializable test struct.
  #[derive(Deserialize, Serialize)]
  struct Produce {
      #[serde(rename = "Fruit FooBar")]
      #[rust_xlsxwriter(set_min_width=8)]
      fruit: &'static str,
      #[serde(rename = "Value")]
      #[rust_xlsxwriter(set_num_format="#,##0.00")] // 2 digits after the decimal point
      cost: f64,
      #[serde(rename = "Date DMY")]
      #[rust_xlsxwriter(set_num_format="dd/mm/yyyy")]
      dmy: Option<NaiveDate>,
      #[serde(rename = "Date MDY",)]
      #[rust_xlsxwriter(set_num_format="mm/dd/yyyy")]
      mdy: NaiveDate,
      #[serde(rename = "Long Description")]
      #[rust_xlsxwriter(set_min_width=8, set_max_width=60)]
      long_description: String
  }

It would also be useful to specify a header format.

jmcnamara commented 7 months ago

@lucatrv and @claudiofsr Thanks for the feedback.

I've pushed an updated (but still WIP) version to main. I've settled on the following method names:

pub fn serialize_headers<T>()
pub fn serialize_headers_with_format<T>()
pub fn serialize_headers_with_options<T>()

pub fn deserialize_headers<'de, T>()
pub fn deserialize_headers_with_format<'de, T>()
pub fn deserialize_headers_with_options<'de, T>()

The _with_options() variants are the interfaces that I currently envisage for dealing with additional formatting. It works like this:

    // Set custom field formatting.
    let custom_headers = [CustomSerializeHeader::new("cost").set_cell_format(&money_format)];

    let header_options = SerializeHeadersOptions::new()
        .set_custom_headers(&custom_headers)
        .set_header_format(&header_format);

    worksheet.deserialize_headers_with_options::<Produce>(1, 1, &header_options)?;

Output:

screenshot

Full code ```rust use rust_xlsxwriter::{ CustomSerializeHeader, Format, FormatBorder, SerializeHeadersOptions, Workbook, XlsxError, }; use serde::{Deserialize, Serialize}; fn main() -> Result<(), XlsxError> { let mut workbook = Workbook::new(); // Add a worksheet to the workbook. let worksheet = workbook.add_worksheet(); // Set some formats. let header_format = Format::new() .set_bold() .set_border(FormatBorder::Thin) .set_background_color("C6EFCE"); let money_format = Format::new().set_num_format("$0.00"); // Create a deserializable/serializable test struct. #[derive(Deserialize, Serialize)] struct Produce { fruit: &'static str, cost: f64, } // Create some data instances. let item1 = Produce { fruit: "Peach", cost: 1.05, }; let item2 = Produce { fruit: "Plum", cost: 0.15, }; let item3 = Produce { fruit: "Pear", cost: 0.75, }; // Set custom field formatting. let custom_headers = [CustomSerializeHeader::new("cost").set_cell_format(&money_format)]; let header_options = SerializeHeadersOptions::new() .set_custom_headers(&custom_headers) .set_header_format(&header_format); worksheet.deserialize_headers_with_options::(1, 1, &header_options)?; // Serialize the data. worksheet.serialize(&item1)?; worksheet.serialize(&item2)?; worksheet.serialize(&item3)?; // Save the file. workbook.save("serialize.xlsx")?; Ok(()) } ```

However, I like the suggestion to encode this functionality in struct attributes. I'll look into that to see how practical it is to implement.

This way, we can specify the format of each column/field individually or globally and also choose the maximum and minimum widths to improve auto_fit().

That is becoming a requested feature for the Python library as well so I'll look into that in a future version.

Just suggestions for implementations ("brainstorm")!

That's fine. That is what I'm looking for. :-)

lucatrv commented 7 months ago

However, I like the suggestion to encode this functionality in struct attributes.

Being able to define field and header formats in a single place would be very convenient, and IMHO using the Serde struct attributes is a great idea. This way nothing else would be needed to serialize to xlsx apart from defining the struct properly.

jmcnamara commented 7 months ago

Being able to define field and header formats in a single place would be very convenient, and IMHO using the Serde struct attributes is a great idea.

I agree, and I spent the last few days looking into it. However, as far as I can see, it isn't feasible. There are at least two issues, the main ones being:

So, unfortunately, I don't think the high degree of configurability within the struct attributes is currently feasible. I'm happy to be corrected since it would be a cleaner scheme.

For now I will continue with the de/serialize_headers_with_options() approach shown above.

lucatrv commented 7 months ago

@jmcnamara here are my 2 cents on this, I hope this is feasible:

jmcnamara commented 7 months ago
  • I would suggest to add to CustomSerializeHeader a new method called set_column_format which could be used to set the full column format before writing headers and values.

That is a good suggestion. I'll implement that. Note, I've already added a similar set_column_width() method to CustomSerializeHeader that mirrors the worksheet method. That change is on main.

  • Then to achieve a similar result of what suggested by @claudiofsr, I guess that a macro could be defined to construct the proper CustomSerializeHeader starting from custom struct attributes notation.

That may be feasible. I was also considering implementing a trait (and helper attributes) that could be added to the #[derive(). I plan to look at that to see if it is feasible (that may be after the next release though). That could be used by the de/serialize_header methods to set up a CustomSerializeHeader transparently to the user (in line with your suggestion).

jmcnamara commented 6 months ago

This feature has been pushed to crates.io in rust_xlsxwriter v0.60.0.

I will work on derive and attribute macros for (hopefully) the next release.