jmcnamara / rust_xlsxwriter

A Rust library for creating Excel XLSX files.
https://crates.io/crates/rust_xlsxwriter
Apache License 2.0
250 stars 23 forks source link

WIP: Custom derive macro for `rust_xlsxwriter` specific serialization options #66

Closed jmcnamara closed 6 months ago

jmcnamara commented 6 months ago

In a comment on #63 @claudiofsr said:

Another suggestion would be to choose to format the data structure like this:

use serde::{Serialize, Deserialize};

  // Create a deserializable/serializable test struct.
  #[derive(Deserialize, Serialize)]
  struct Produce {
      #[serde(rename = "Fruit FooBar")]
      #[rust_xlsxwriter(set_min_width=8)]
      fruit: &'static str,
      #[serde(rename = "Value")]
      #[rust_xlsxwriter(set_num_format="#,##0.00")] // 2 digits after the decimal point
      cost: f64,
      #[serde(rename = "Date DMY")]
      #[rust_xlsxwriter(set_num_format="dd/mm/yyyy")]
      dmy: Option<NaiveDate>,
      #[serde(rename = "Date MDY",)]
      #[rust_xlsxwriter(set_num_format="mm/dd/yyyy")]
      mdy: NaiveDate,
      #[serde(rename = "Long Description")]
      #[rust_xlsxwriter(set_min_width=8, set_max_width=60)]
      long_description: String
  }

I've started work on implementing a custom derive macro to enable this functionality on the derive branch.

Here is the feature set to be implemented

SerializeFieldOptions:

CustomSerializeField

Serde Container attributes:

Serde Field attributes:

Packaging:

As a start, you can now do this:

use derive::ExcelSerialize;
use rust_xlsxwriter::{ExcelSerialize, Workbook, XlsxError};
use serde::Serialize;

fn main() -> Result<(), XlsxError> {
    let mut workbook = Workbook::new();

    // Add a worksheet to the workbook.
    let worksheet = workbook.add_worksheet();

    #[derive(ExcelSerialize, Serialize)]
    struct Produce {
        fruit: &'static str,
        cost: f64,
        in_stock: bool,
    }

    // Create some data instances.
    let items = [
        Produce {
            fruit: "Peach",
            cost: 1.05,
            in_stock: true,
        },
        Produce {
            fruit: "Plum",
            cost: 0.15,
            in_stock: false,
        },
        Produce {
            fruit: "Pear",
            cost: 0.75,
            in_stock: true,
        },
    ];

    worksheet.set_serialize_headers::<Produce>(0, 0)?;
    worksheet.serialize(&items)?;

    // Save the file.
    workbook.save("serialize.xlsx")?;

    Ok(())
}

Output:

screenshot

Note the ExcelSerialize derived trait. The set_serialize_headers() doesn't use Serde serialization or deserialization (for the headers). Instead the macro generates code a custom impl for the type inline like this:

    impl ExcelSerialize for Produce {
        fn to_rust_xlsxwriter() -> rust_xlsxwriter::SerializeFieldOptions {
            let mut custom_headers: Vec<rust_xlsxwriter::CustomSerializeField> 
                = ::alloc::vec::Vec::new();
            custom_headers.push(rust_xlsxwriter::CustomSerializeField::new("fruit"));
            custom_headers.push(rust_xlsxwriter::CustomSerializeField::new("cost"));
            custom_headers.push(rust_xlsxwriter::CustomSerializeField::new("in_stock"));
            rust_xlsxwriter::SerializeFieldOptions::new()
                .set_struct_name("Produce")
                .set_custom_headers(&custom_headers)
        }
    }

It should be straight forward to support attributes like #[rust_xlsxwriter(set_num_format="dd/mm/yyyy")]. However, interacting (correctly) with Serde attributes will be a little trickier.

I'll continue to work on this for the next few weeks and post updates when there is enough functionality to try it out with a more realistic use case.

@lucatrv for information.

jmcnamara commented 6 months ago

I asked a question about this on users.rust-lang.org. Some interesting suggestions there.

jmcnamara commented 6 months ago

Just an update on progress on this. You can now mix and match rust_xlsxwriter and some (currently only 1) serde attributes.

Like this:

se rust_xlsxwriter::{ExcelSerialize, Workbook, XlsxError};
use rust_xlsxwriter_derive::ExcelSerialize;
use serde::Serialize;

fn main() -> Result<(), XlsxError> {
    let mut workbook = Workbook::new();

    // Add a worksheet to the workbook.
    let worksheet = workbook.add_worksheet();

    #[derive(ExcelSerialize, Serialize)]
    #[allow(dead_code)]
    struct Produce {
        #[rust_xlsxwriter(rename = "Item")]
        fruit: &'static str,

        #[rust_xlsxwriter(rename = "Price")]
        #[rust_xlsxwriter(num_format = "$0.00")]
        cost: f64,

        #[serde(skip)]
        in_stock: bool,
    }

    // Create some data instances.
    let items = [
        Produce {
            fruit: "Peach",
            cost: 1.05,
            in_stock: true,
        },
        Produce {
            fruit: "Plum",
            cost: 0.15,
            in_stock: false,
        },
        Produce {
            fruit: "Pear",
            cost: 0.75,
            in_stock: true,
        },
    ];

    worksheet.set_serialize_headers::<Produce>(0, 0)?;
    worksheet.serialize(&items)?;

    // Save the file.
    workbook.save("serialize.xlsx")?;

    Ok(())
}

Output:

screenshot

There is still quite a bit of work to make this robust but at least it demonstrates that it is feasible.

zjp-CN commented 6 months ago

Can we make the attribute name rust_xlsxwriter shorter as xlsxwriter?

jmcnamara commented 6 months ago

Can we make the attribute name rust_xlsxwriter shorter as xlsxwriter?

That is a good suggestion. rust_xlsxwriter is a bit long for an attribute.

There is a different xlsxwriter library that I didn't write but wraps the libxlxwriter library which I did write. I would have preferred to use just xlsxwriter for the rust crate but that wasn't possible.

So the shorted attribute "path" of xlsxwriter would be better but I wonder if it would be confusing. Presumably not to the person writing the code but maybe to a reader.

I'll definitely consider it though.

@lucatrv your 2 cents?

lucatrv commented 6 months ago

I think that using xlsxwriter as a shorted attribute path, while keeping the rust_xlsxwriter library name (also considering that another xlsxwriter library already exists), would be confusing. What if someone uses both libraries in the same code? Instead (but maybe it is too late to suggest this...) one could consider a shorter name for the library itself. I know that many *xlsx* names are already reserved, but for instance xlsxw is available and it sounds good to me.

lucatrv commented 6 months ago
#[rust_xlsxwriter(rename = "Item")]
fruit: &'static str,

Is it also possible to rely on #[serde(rename = "Item")], for instance in case it is already declared for other serializers?

jmcnamara commented 6 months ago

Is it also possible to rely on #[serde(rename = "Item")], for instance in case it is already declared for other serializers?

Yes. That is the goal. But it is also the hard part. :-) Hence my question on users.rust-lang.org.

jmcnamara commented 6 months ago

Instead (but maybe it is too late to suggest this...) one could consider a shorter name for the library itself. I know that many *xlsx* names are already reserved, but for instance xlsxw is available and it sounds good to me.

Naming things is hard. I had also registered excelwriter and made an earlier attempt to move the code to that but in the end I didn't.

lucatrv commented 6 months ago

Naming things is hard. I had also registered excelwriter and made an earlier attempt to move the code to that but in the end I didn't.

That's a nice name indeed, and there's already an eminent homonymous ;)

zjp-CN commented 6 months ago

Is it also possible to rely on #[serde(rename = "Item")], for instance in case it is already declared for other serializers?

Yes. That is the goal. But it is also the hard part. :-)

Well, actually it's not that hard... Here's the MVP/POV code: https://github.com/zjp-CN/parse-serde-macro/blob/main/_impl/src/lib.rs

// key implementation
            if ident == "rust_xlsxwriter" || ident == "xlsxwriter" || ident == "serde" {
                // e.g. #[ident(one)] or #[ident(one, two)] or #[ident(one, two, ...)]
                let parsed = Punctuated::<AttributeTypes, Token![,]>::parse_separated_nonempty
                    .parse2(list.tokens)?;
                return Ok(parsed.into_iter().collect());
            }

user interface :

// field `b` => [Rename(LitStr { token: "serde rename for b" })]
// field `c` => [Skip]
// field `d` => [Rename(LitStr { token: "rust_xlsxwriter rename for d" }), NumFormat(LitStr { token: "$0.00" })]
// field `e` => [Rename(LitStr { token: "xlsxwriter rename for d" }), NumFormat(LitStr { token: "$0.00" })]
// field `f` => [Skip]
#[derive(_impl::ExcelSerialize, serde::Serialize)]
pub struct A {
    #[serde(rename = "serde rename for b")]
    b: (),
    #[serde(skip)]
    c: (),

    #[rust_xlsxwriter(rename = "rust_xlsxwriter rename for d", num_format = "$0.00")]
    d: (),

    #[xlsxwriter(rename = "xlsxwriter rename for d", num_format = "$0.00")]
    e: (),
    #[xlsxwriter(skip)]
    f: (),
}

// error: `not_exist` is not supported by ExcelSerialize derive macro
//   --> src/main.rs:40:18
//    |
// 40 |     #[xlsxwriter(not_exist)]
//    |                  ^^^^^^^^^
#[derive(_impl::ExcelSerialize, serde::Serialize)]
pub struct B {
    #[xlsxwriter(not_exist)]
    f: (),
}
lucatrv commented 6 months ago

Yes. That is the goal. But it is also the hard part. :-) Hence my question on users.rust-lang.org.

IMHO once the integration with Serde is completed, so that Serde's rename, skip, etc are supported, then the corresponding rust_xlsxwriter implementations should be removed, because they would represent overlapping and practically useless annotations which would only create confusion and possible bugs. Only the additional features specific for Excel writing should be left, such as set_header_format, set_column_format, etc.

lucatrv commented 6 months ago

Naming things is hard. I had also registered excelwriter and made an earlier attempt to move the code to that but in the end I didn't.

That's a nice name indeed, and there's already an eminent homonymous ;)

@jmcnamara, the more I think about this, the more I would suggest to change name into excelwriter, for two reasons. The first one is to reduce library and path name lengths, as suggested by @zjp-CN. The second one, in my opinion even more important, is to make this project much more reachable and to avoid confusion with the other existing *xlsxwriter libraries. I remember the first time I was looking for a library to use in my code, I started my research on crates.io with the excel keyword, before the xlsx keyword. Then, it was quite difficult to orient myself among the *xlsxwriter libraries, and to finally understand that rust_xlsxwriter was the best one. Even nowadays, when I google rust_xlsxwriter docs, the link xlsxwriter - Rust comes before rust_xlsxwriter - Rust, and this is an additional source of confusion. Now I know the difference, but I can imagine new users getting confused among these two Rust libraries, and libxlsxwriter. You now have 88k downloads, it is unfortunate to loose them but soon you will have much more, and I think changing name will help that. It may actually be the last opportunity to think about this. You are now implementing these Serde integration features which will be a game changer. You could freeze the rust_xlsxwriter development at release v0.6.0, and notify that all new features will be implemented in excelwriter. I think this is still possible, and if you need some help in changing name references I could help.

jmcnamara commented 6 months ago

Well, actually it's not that hard... Here's the MVP/POV code

@zjp-CN Thank you. I really appreciate that.

I actually got the #[serde(rename = "...")] part working late yesterday. The code is here.

In terms of difficultly I was thinking more about the container attribute #[serde(rename_all = "...")] but even that probably won't be too difficult.

Your code looks much cleaner and more generic. Could you have a look at my implementation and make any suggestions that you have.

One issue that I have and haven't managed to come up with a solution for is that I would like to an rust_xlsxwriter Format object as an attribute parameter. Something like this:

    #[derive(ExcelSerialize, Serialize)]
    struct Produce {
        fruit: &'static str,

        #[rust_xlsxwriter(value_format = Format::new().set_bold())]
        cost: f64,

        in_stock: bool,
    }

Is that possible to handle non-primitives as an attribute value? Anything that I have tried so far end up with a literal string or a parse error.

I would also need to modify the token stream to get rust_xlsxwriter::Format::new().set_bold() (and maybe some other qualifiers).

Any thoughts on whether that is possible and if so how?

jmcnamara commented 6 months ago

the more I think about this, the more I would suggest to change name into

@lucatrv I think you are right. Could you open a separate Feature Request for that so we can have the discussion separately to this thread. I think it is worth doing.

zjp-CN commented 6 months ago

One issue that I have and haven't managed to come up with a solution for is that I would like to an rust_xlsxwriter Format object as an attribute parameter.

You can simply add it as follows:

@@ -57,6 +57,7 @@ enum AttributeTypes {
     Skip,
     Rename(LitStr),
     NumFormat(LitStr),
+    FormatObj(Expr),
 }

@@ -73,6 +74,9 @@ fn parse(input: syn::parse::ParseStream) -> syn::Result<Self> {
         } else if ident == "num_format" {
             let _ = input.parse::<Token![=]>()?;
             Ok(Self::NumFormat(input.parse()?))
+        } else if ident == "value_format" {
+            let _ = input.parse::<Token![=]>()?;
+            Ok(Self::FormatObj(input.parse()?))
         } else {
             Err(syn::Error::new(
                 ident.span(),

Could you have a look at my implementation and make any suggestions that you have.

Yes, and I've already wrote the code partially based on yours. I can open a PR for your derive branch, but I don't know if there are tests for the feature.

jmcnamara commented 6 months ago

@zjp-CN

You can simply add it as follows:

Thank you. I'll try that.

I can open a PR for your derive branch, but I don't know if there are tests for the feature.

There are integration test to cover it. I normally run the integration tests like the following to avoid the long compilation and test time for the examples:

cargo test --test '*'

Currently the following tests cover this functionality. I'll add more once I get the Format support working.

tests/integration/serde01.rs
tests/integration/serde12.rs
tests/integration/serde07.rs

Just some notes:

Thanks once more.

zjp-CN commented 6 months ago
cargo test --test '*'

Thanks. I'll go with cargo test --test integration serde to only run tests the name of which contains serde in tests/integration.

There are subtle differences so they should be handled differently.

I know they bring different meanings, but I think we need both of #[serde(skip)] and #[rust_xlsxwriter(skip)] (same for #[serde(rename] and #[rust_xlsxwriter(rename] ), because

#[serde(skip)] = #[rust_xlsxwriter(skip)] in ExcelSerialize + original meaning in serde::Serialize

I don't know if it works in my current code but if it doesn't then don't put effort into it.

I checked the code in parse_nested_meta, and it indeed handles multiple arguments separated by a comma with an optional trailing comma: it basically parses a path first for you, and offers you the responsibility in the callback to push the token cursor forward when the tokens follows the checked path but without thinking about the comma separator. So it works really well, and you don't need my code, as both versions almost do the same thing (but parse_nested_meta is more powerful).

jmcnamara commented 6 months ago

I've updated the code to handle all of the Serde rename variants including one nested variant.

I'm moving on to the Serde skip_serializing_if and the various rust_xlsxwriter format variants.

Update 1: I have never used this option before but it turns out that skip_serializing_if is used to skip individual values and not the field. This means that it doesn't have to be handled by the rust_xlsxwriter proc macros but it also means that using this will cause mismatched output in Excel. That is probably a bug so I'll handle that in a separate Bug tracker after this work.

jmcnamara commented 6 months ago

Good news for a small audience. I got all of the major functionality working including the format support.

Here is an example:

use rust_xlsxwriter::{ExcelSerialize, Workbook, XlsxError};
use serde::Serialize;

fn main() -> Result<(), XlsxError> {
    let mut workbook = Workbook::new();

    // Add a worksheet to the workbook.
    let worksheet = workbook.add_worksheet();

    // Create a serializable struct.
    #[derive(ExcelSerialize, Serialize)]
    #[rust_xlsxwriter(header_format = Format::new()
        .set_bold()
        .set_border(FormatBorder::Thin)
        .set_background_color("C6EFCE"))]
    #[serde(rename_all = "PascalCase")]
    struct Produce {
        fruit: &'static str,

        #[rust_xlsxwriter(value_format = Format::new().set_num_format("$0.00"))]
        cost: f64,
    }

    // Create some data instances.
    let item1 = Produce {
        fruit: "Peach",
        cost: 1.05,
    };

    let item2 = Produce {
        fruit: "Plum",
        cost: 0.15,
    };

    let item3 = Produce {
        fruit: "Pear",
        cost: 0.75,
    };

    // Set the serialization location and headers.
    worksheet.set_serialize_headers::<Produce>(0, 0)?;

    // Serialize the data.
    worksheet.serialize(&item1)?;
    worksheet.serialize(&item2)?;
    worksheet.serialize(&item3)?;

    // Save the file to disk.
    workbook.save("serialize.xlsx")?;

    Ok(())
}

Which generates this output:

screenshot

The header format is a bit long in this example but I wanted to replicate the previous example. The diff with the previous example looks like this:

diff --git a/examples/app_serialize.rs b/examples/app_serialize.rs
index 1b4810a..3e7980b 100644
--- a/examples/app_serialize.rs
+++ b/examples/app_serialize.rs
@@ -5,10 +5,8 @@
 //! Example of serializing Serde derived structs to an Excel worksheet using
 //! `rust_xlsxwriter`.

-use rust_xlsxwriter::{
-    CustomSerializeField, Format, FormatBorder, SerializeFieldOptions, Workbook, XlsxError,
-};
-use serde::{Deserialize, Serialize};
+use rust_xlsxwriter::{ExcelSerialize, Workbook, XlsxError};
+use serde::Serialize;

 fn main() -> Result<(), XlsxError> {
     let mut workbook = Workbook::new();
@@ -16,19 +14,17 @@ fn main() -> Result<(), XlsxError> {
     // Add a worksheet to the workbook.
     let worksheet = workbook.add_worksheet();

-    // Set some formats.
-    let header_format = Format::new()
+    // Create a serializable struct.
+    #[derive(ExcelSerialize, Serialize)]
+    #[rust_xlsxwriter(header_format = Format::new()
         .set_bold()
         .set_border(FormatBorder::Thin)
-        .set_background_color("C6EFCE");
-
-    let value_format = Format::new().set_num_format("$0.00");
-
-    // Create a serializable struct.
-    #[derive(Deserialize, Serialize)]
+        .set_background_color("C6EFCE"))]
     #[serde(rename_all = "PascalCase")]
     struct Produce {
         fruit: &'static str,
+
+        #[rust_xlsxwriter(value_format = Format::new().set_num_format("$0.00"))]
         cost: f64,
     }

@@ -48,13 +44,8 @@ fn main() -> Result<(), XlsxError> {
         cost: 0.75,
     };

-    // Set the custom headers.
-    let header_options = SerializeFieldOptions::new()
-        .set_header_format(&header_format)
-        .set_custom_headers(&[CustomSerializeField::new("Cost").set_value_format(&value_format)]);
-
     // Set the serialization location and headers.
-    worksheet.deserialize_headers_with_options::<Produce>(0, 0, &header_options)?;
+    worksheet.set_serialize_headers::<Produce>(0, 0)?;

     // Serialize the data.
     worksheet.serialize(&item1)?;

The nice thing about it is that you get standard rustc error messages:

error[E0599]: no method named `set_scold` found for struct `Format` in the current scope
  --> examples/app_serialize.rs:20:10
   |
19 |       #[rust_xlsxwriter(header_format = Format::new()
   |  _______________________________________-
20 | |         .set_scold()
   | |         -^^^^^^^^^ help: there is a method with a similar name: `set_bold`
   | |_________|
   | 

// or

error: unknown rust_xlsxwriter attribute: `skipper`
  --> examples/app_serialize.rs:27:27
   |
27 |         #[rust_xlsxwriter(skipper)]
   |                           ^^^^^^^

In case anyone is interested, here is what the expanded derive output for the example above looks like:

    #[doc(hidden)]
    const _: () = {
        #[allow(unused_imports)]
        use rust_xlsxwriter::{
            Color, Format, FormatAlign, FormatBorder, FormatDiagonalBorder,
            FormatPattern, FormatScript, FormatUnderline,
        };
        impl ExcelSerialize for Produce {
            fn to_serialize_field_options() -> rust_xlsxwriter::SerializeFieldOptions {
                let mut custom_headers: Vec<rust_xlsxwriter::CustomSerializeField> = ::alloc::vec::Vec::new();
                custom_headers.push(rust_xlsxwriter::CustomSerializeField::new("Fruit"));
                custom_headers
                    .push(
                        rust_xlsxwriter::CustomSerializeField::new("Cost")
                            .set_value_format(Format::new().set_num_format("$0.00")),
                    );
                rust_xlsxwriter::SerializeFieldOptions::new()
                    .set_header_format(
                        Format::new()
                            .set_bold()
                            .set_border(FormatBorder::Thin)
                            .set_background_color("C6EFCE"),
                    )
                    .set_struct_name("Produce")
                    .set_custom_headers(&custom_headers)
            }
        }
    };

I stole the const _: () = {} wrapper idea from serde.

Overall I'm very happy with the way this turned out. I may drop/deprecate/hide the existing serialize/deserialize way of setting up the headers and only document this.

I now have a heartfelt appreciation for the functionality of the syn, quote and serde crates. There is some impressive software engineering in there. All the above functionality comes from around 500 lines of code. (500 hard fought lines of code but nevertheless.)

I'll move this back onto main shortly and fix up the feature flags so that folks can try it out. Thanks @zjp-CN and @lucatrv for the help and feedback.

lucatrv commented 6 months ago

I'll move this back onto main shortly and fix up the feature flags so that folks can try it out. Thanks @zjp-CN and @lucatrv for the help and feedback.

Thanks @jmcnamara , I'm looking forward to testing this out!

jmcnamara commented 6 months ago

I've merged this back to main. The rust_xlsxwriter_derive macro is a dependency on serde.

You should add the following to your Cargo.toml for testing.

[dependencies]
rust_xlsxwriter = { git = "https://github.com/jmcnamara/rust_xlsxwriter.git", version = "0.60.0", features = ["serde"] }
serde = { version = "1.0.195", features = ["derive"] }

I seem to have broken installation via cargo add:

$ cargo add --git https://github.com/jmcnamara/rust_xlsxwriter.git -F serde
    Updating git repository `https://github.com/jmcnamara/rust_xlsxwriter.git`
error: multiple packages found at `https://github.com/jmcnamara/rust_xlsxwriter.git`:
    rust_xlsxwriter, rust_xlsxwriter_derive
To disambiguate, run `cargo add --git https://github.com/jmcnamara/rust_xlsxwriter.git <package>`

$ cargo add --git https://github.com/jmcnamara/rust_xlsxwriter.git -F serde rust_xlsxwriter
    Updating git repository `https://github.com/jmcnamara/rust_xlsxwriter.git`
      Adding rust_xlsxwriter (git) to dependencies.
error: unrecognized feature for crate rust_xlsxwriter: serde
no features available for crate rust_xlsxwriter

I'm not sure if this is a cargo add issue or if I need to set up a workspace since there are now to packages/crates in the repo. If anyone knows let me know. (It works when installing via cargo add --path ...).

zjp-CN commented 6 months ago

To disambiguate, run cargo add --git https://github.com/jmcnamara/rust_xlsxwriter.git <package>

That tells us how to do it right by sepecifying the package in the workshop:

cargo add --git https://github.com/jmcnamara/rust_xlsxwriter.git rust_xlsxwriter -F serde
jmcnamara commented 6 months ago

That tells us how to do it right by sepecifying the package in the workshop:

I had tried that but it can't seem to process the feature flag:

$ cargo add --git https://github.com/jmcnamara/rust_xlsxwriter.git rust_xlsxwriter -F serde
    Updating git repository `https://github.com/jmcnamara/rust_xlsxwriter.git`
      Adding rust_xlsxwriter (git) to dependencies.
error: unrecognized feature for crate rust_xlsxwriter: serde

I'll see if moving to a workspace improves things.

zjp-CN commented 6 months ago

I had tried that but it can't seem to process the feature flag:

How come??? It worked on my machine with rustc 1.77.0-nightly (595bc6f00 2024-01-05)

$ cargo add --git https://github.com/jmcnamara/rust_xlsxwriter.git rust_xlsxwriter -F serde
    Updating git repository `https://github.com/jmcnamara/rust_xlsxwriter.git`
remote: Enumerating objects: 10, done.
remote: Counting objects: 100% (10/10), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 10 (delta 6), reused 8 (delta 6), pack-reused 0
Unpacking objects: 100% (10/10), 991 bytes | 20.00 KiB/s, done.
From https://github.com/jmcnamara/rust_xlsxwriter
   9465a21..1f648f6             -> origin/HEAD
      Adding rust_xlsxwriter (git) to dependencies.
             Features:
             + serde
             - chrono
             - js-sys
             - polars
             - test-resave
             - wasm
             - wasm-bindgen
             - zlib
zjp-CN commented 6 months ago

Oh, it was a bug in cargo, and it was fixed two weeks ago. But it will land on stable Rust in 1.77.0, which will be released on 21 March, 2024.

jmcnamara commented 6 months ago

Oh, it was a bug in cargo, and it was fixed two weeks ago. But it will land on stable Rust in 1.77.0, which will be released on 21 March, 2024.

Thanks. I am on Rust v1.75.0. :-)

lucatrv commented 6 months ago

Overall I'm very happy with the way this turned out. I may drop/deprecate/hide the existing serialize/deserialize way of setting up the headers and only document this.

Maybe you should keep both ways, in case someone uses Serde without the derive feature.

zjp-CN commented 6 months ago

Maybe you should keep both ways, in case someone uses Serde without the derive feature.

Likewise. And imagine someone would like to use the same struct to generate different columns: if you kill the way to customize headers, the only way is to duplicate structs with different derive macros on them and verbose conversion code...

lucatrv commented 6 months ago

I hit the following issue:

#[derive(Deserialize, Serialize, ExcelSerialize)]
#[serde(rename_all = "UPPERCASE")]
struct Record<'a> {
    articolo: &'a str,
    descrizione: &'a str,
    stato: u8,
}
error[E0726]: implicit elided lifetime not allowed here
   --> src\lib.rs:200:12
    |
200 |     struct Record<'a> {
    |            ^^^^^^ expected lifetime parameter
    |
help: indicate the anonymous lifetime
    |
200 |     struct Record<'_><'a> {
    |                  ++++

For more information about this error, try `rustc --explain E0726`.
jmcnamara commented 6 months ago

I hit the following issue:

Thanks. The macro currently only catches the struct name and not the lifetime or other (generics)? bounds. I'll look into that.

jmcnamara commented 6 months ago

The macro currently only catches the struct name and not the lifetime or other (generics)? bounds. I'll look into that.

@lucatrv I've pushed a fix to main for lifetime, impl generics, type generics and where.

lucatrv commented 6 months ago

OK now it works.

Do you think it would be possible to refer somehow to defined formats, or is this not feasible at all? Something like:

let center_format = Format::new().set_align(FormatAlign::Center);

#[derive(Deserialize, Serialize, ExcelSerialize)]
#[serde(rename_all = "UPPERCASE")]
#[rust_xlsxwriter(header_format = center_format)]
struct Record<'a> {
    #[rust_xlsxwriter(column_format = center_format)]
    articolo: &'a str,
    descrizione: &'a str,
    #[rust_xlsxwriter(column_format = center_format)]
    stato: u8,
}
zjp-CN commented 6 months ago

Do you think it would be possible to refer somehow to defined formats, or is this not feasible at all?

You can already do it as

    #[derive(Serialize, ExcelSerialize)]
    #[serde(rename_all = "UPPERCASE")]
    #[rust_xlsxwriter(header_format = Format::new().set_align(FormatAlign::Center))]
    struct Record<'a> {
        #[rust_xlsxwriter(column_format = Format::new().set_align(FormatAlign::Center))]
        articolo: &'a str,
        descrizione: &'a str,
        #[rust_xlsxwriter(column_format = Format::new().set_align(FormatAlign::Center))]
        stato: u8,
    }

It's a little verbose to repeat Format::new().set_align(FormatAlign::Center), but that's how macros work in Rust, by pasting tokens. Some improvements can be

lucatrv commented 6 months ago

It's a little verbose to repeat Format::new().set_align(FormatAlign::Center), but that's how macros work in Rust, by pasting tokens. Some improvements can be

* (not feasible today) make methods on Format const: replace these tokens with a const `FORMAT` where there is `const FORMAT: Format = ...;` in scope

* encode all the cases as attribute parameters: `#[rust_xlsxwriter(align = "center")]` or something

I would consider this point, because formats can get pretty long and they may need to be applied to several struct fields.

jmcnamara commented 6 months ago

Do you think it would be possible to refer somehow to defined formats, or is this not feasible at all? Something like

As @zjp-CN points out the format parts need to be const/non-dynamic since they will end up in a fn block within the impl.

However you could use functions to give you re-use and reduce verbosity. Something like this:

use rust_xlsxwriter::{ExcelSerialize, Format, Workbook, XlsxError, FormatBorder};
use serde::Serialize;

fn main() -> Result<(), XlsxError> {
    let mut workbook = Workbook::new();

    // Add a worksheet to the workbook.
    let worksheet = workbook.add_worksheet();

    // Set up some formats.
    fn my_header_format() -> Format {
        Format::new()
            .set_bold()
            .set_border(FormatBorder::Thin)
            .set_background_color("C6EFCE")
    }

    fn my_red_num_format(s: &str) -> Format {
        Format::new().set_font_color("FF0000").set_num_format(s)
    }

    // Create a serializable struct.
    #[derive(ExcelSerialize, Serialize)]
    #[rust_xlsxwriter(header_format = my_header_format())]
    #[serde(rename_all = "PascalCase")]
    struct Produce {
        fruit: &'static str,

        #[rust_xlsxwriter(value_format = my_red_num_format("$0.00"))]
        cost: f64,
    }

    // Create some data instances.
    let item1 = Produce {
        fruit: "Peach",
        cost: 1.05,
    };

    let item2 = Produce {
        fruit: "Plum",
        cost: 0.15,
    };

    let item3 = Produce {
        fruit: "Pear",
        cost: 0.75,
    };

    // Set the serialization location and headers.
    worksheet.set_serialize_headers::<Produce>(0, 0)?;

    // Serialize the data.
    worksheet.serialize(&item1)?;
    worksheet.serialize(&item2)?;
    worksheet.serialize(&item3)?;

    // Save the file to disk.
    workbook.save("serialize.xlsx")?;

    Ok(())
}

screenshot

lucatrv commented 6 months ago

This works, hurrah!

#![feature(lazy_cell)]

use std::sync::LazyLock;

static CENTER_FORMAT: LazyLock<Format> = LazyLock::new(|| {
    Format::new().set_align(FormatAlign::Center)
});

#[derive(Deserialize, Serialize, ExcelSerialize)]
#[serde(rename_all = "UPPERCASE")]
#[rust_xlsxwriter(header_format = &*CENTER_FORMAT)]
struct Record<'a> {
    #[rust_xlsxwriter(column_format = &*CENTER_FORMAT)]
    articolo: &'a str,
    descrizione: &'a str,
    #[rust_xlsxwriter(column_format = &*CENTER_FORMAT)]
    stato: u8,
}

I think this is the best way to apply formats, so I would document it in the examples.

You need rustc nightly to compile this, waiting for LazyCell / LazyLock to be stabilized: https://blog.rust-lang.org/2023/06/01/Rust-1.70.0.html#oncecell-and-oncelock https://doc.rust-lang.org/std/cell/struct.LazyCell.html https://doc.rust-lang.org/std/sync/struct.LazyLock.html

zjp-CN commented 6 months ago

I think this is the best way to apply formats, so I would document it in the examples.

I don't think so... I think I should withdraw the first improvement because defining a custom format function is an awesome solution: function is flexible, IDE friendly, and repeatable.

As for your static variable + &*CENTER_FORMAT solution here, it's actually based on the impl From<&Format> for Format fact which fundamentally calls Format::clone(). That is you end up with a new Format instance anyway, so why not use a trivial function returning a new Format instance to do this. Plus, IMHO LazyLock is really abused here.


For my second suggestion

  • encode all the cases as attribute parameters: #[rust_xlsxwriter(align = "center")] or something

I would like to implement it if you guys like the idea. The API in design

// column format: header + value format
#[derive(ExcelSerialize, Serialize)]
#[column_format(align = "center", font_name = "...", border = "thin")] // applies for all columns
struct Data { ... }
#[derive(ExcelSerialize, Serialize)] 
struct Data { #[column_format(align = "center", border = "thin")] field: i32 } // applies only for one field column

// header format
#[derive(ExcelSerialize, Serialize)]
#[header_format(align = "center", font_size = 45, bold, italic)] // applies for all headers
struct Data { ... }
#[derive(ExcelSerialize, Serialize)]
struct Data { #[header_format(bold) field: i32 } // applies only for one field header

// value format 
#[derive(ExcelSerialize, Serialize)]
#[value_format(num_format = "0.00")] // applies for all values of all fields
struct Data { ... }
#[derive(ExcelSerialize, Serialize)]
struct Data { #[value_format(num_format = "0.00")] field: i32 } // applies only for values of one field

// combined formats
#[derive(ExcelSerialize, Serialize)]
struct Data {
    #[header_format(border = "thick")]
    #[value_format(border = "thin", num_format = "0.00")]
    field: i32
}

// overridden formats
#[derive(ExcelSerialize, Serialize)]
#[column_format(background_color= "#123456")]
struct Data {
    #[header_format(background_color= "#7890AB")]
    #[value_format(background_color= "#ABCDEF")]
    field: i32
}
jmcnamara commented 6 months ago

I would like to implement it if you guys like the idea. The API in design

@zjp-CN Could you hold off on this. I'll admit that is what I thought the API would look like initially because I wasn't sure if it was possible to parse an expression. However I'd prefer not to get into having to document and test a secondary interface to Format. There are currently 40 public set_xxx() APIs on that struct.

I think a better overall formatting solution would be to add support for Worksheet Tables to get something like this:

screenshot 1

That is what the Polars folks use as the default format on output dataframes.

I am also currently working on some of the refactoring to make that happen as part of the fix for #71.

zjp-CN commented 6 months ago

That is what the Polars folks use as the default format on output dataframes.

Sorry, I'm not familiar with polars, what does it mean?

lucatrv commented 6 months ago

As for your static variable + &*CENTER_FORMAT solution here, it's actually based on the impl From<&Format> for Format fact which fundamentally calls Format::clone(). That is you end up with a new Format instance anyway, so why not use a trivial function returning a new Format instance to do this. Plus, IMHO LazyLock is really abused here.

I agree with you if in my example above CENTER_FORMAT is actually cloned three times.

For my second suggestion

  • encode all the cases as attribute parameters: #[rust_xlsxwriter(align = "center")] or something

I would like to implement it if you guys like the idea. The API in design

I think the best of both words would be to define a format! macro to construct new Formats, which takes for instance:

let custom_format = format![align = "center", font_name = "...", border = "thin"];

Then the same macro could also be used within the Serde struct annotations. Finally the documentation examples should suggest the best way to define a format in one place and apply it to several fields, for instance using functions:

fn custom_format() -> Format {
    format![align = "center", font_name = "...", border = "thin"]
}
zjp-CN commented 6 months ago

Finally the documentation examples should suggest the best way to define a format in one place and apply it to several fields, for instance using functions:

Great! I like the design. Why not make format! generate the function and return it 😃

let custom_format = format![align = "center", font_name = "...", border = "thin"];

// expansion
let custom_format = {
    fn custom_format() -> Format { ... }
    custom_format
};
// then custom_format is copyable ~

Update: this results in #[rust_xlsxwriter(column_format = custom_format())], but we can make it shorter as #[rust_xlsxwriter(column_format = custom_format)]:

Update2: wait, since most structs are defined under modules instead of scopes of functions, it would be better for format! to generate a Format value as @lucatrv suggests.

jmcnamara commented 6 months ago

Sorry, I'm not familiar with polars, what does it mean?

@zjp-CN Sorry, I should have added a link there. Polars is a Rust/Python Dataframe library.

They implemented Xlsx output based on the Python version of rust_xlsxwriter and use a default table format for each dataframe.

There are some visual examples here: https://github.com/pola-rs/polars/issues/5568 and here.

jmcnamara commented 6 months ago

I think the best of both words would be to define a format! macro to construct new Formats, which takes for instance:

Folks, before anyone gets too far into this I want to say that I'm not in favour of a secondary interface to Format. Not at this point in time. If someone wants to open a Feature Request for it we can look at it once the basic serde support is complete. For now I think the function wrapper suggested above is a reasonable workaround.

At the same time I'm not completely against the format!() idea. That is basically what I implemented in the Python version of Format. In addition to the various setters you can also use an anonymous Dict like this:

header_format = workbook.add_format(
    {
        "bold": True,
        "text_wrap": True,
        "valign": "top",
        "fg_color": "#D7E4BC",
        "border": 1,
    }
)

However, that will need a fair amount of documentation and tests and it loses enum validation for the property values. At the moment that feels like one too many rabbit holes. So let's punt that out until later.

lucatrv commented 6 months ago

This does not seem to be working:

#[rust_xlsxwriter(header_format = Format::new().set_bold(), column_width = 10.0)]

It only applies the last annotation. In this case it applies only column_width = 10.0, while if I swap the two annotations it applies only header_format = Format::new().set_bold().

jmcnamara commented 6 months ago

This does not seem to be working:

Good catch. Fixed on main, with a test.

lucatrv commented 6 months ago

Fixed on main, with a test.

Now it works, thanks!

lucatrv commented 6 months ago

@jmcnamara, please notice that something in main broke the serialization support for Result<T, E>, which works in v0.60.0.

jmcnamara commented 6 months ago

please notice that something in main broke the serialization support for Result<T, E>, which works in v0.60.0.

@lucatrv The tests for that feature are still passing. Could you add a small working sample application that demonstrates the break in the #64 thread.

lucatrv commented 6 months ago

Sorry yesterday it was too late... now I see that my code does not work with v0.60.0 neither, I'll try to dig into this.

lucatrv commented 6 months ago

This code does not work to me:

use rust_xlsxwriter::{Format, FormatBorder, Workbook, XlsxError};
use serde::{Serialize, Deserialize};

fn main() -> Result<(), XlsxError> {
    let mut workbook = Workbook::new();

    // Add a worksheet to the workbook.
    let worksheet = workbook.add_worksheet();

    // Add some formats to use with the serialization data.
    let header_format = Format::new()
        .set_bold()
        .set_border(FormatBorder::Thin)
        .set_background_color("C6E0B4");

    // Create a serializable struct.
    #[derive(Deserialize, Serialize)]
    #[serde(rename_all = "PascalCase")]
    struct Student<'a> {
        name: &'a str,
        age: Result<f64, String>,
        id: Result<f64, String>,
    }

    let students = [
        Student {
            name: "Aoife",
            age: Ok(1.0),
            id: Err(String::from("564351")),
        },
        Student {
            name: "Caoimhe",
            age: Err(String::new()),
            id: Ok(443287.0),
        },
    ];

    // Set up the start location and headers of the data to be serialized.
    worksheet.deserialize_headers_with_format::<Student>(1, 3, &header_format)?;

    // Serialize the data.
    worksheet.serialize(&students)?;

    // Save the file.
    workbook.save("serialize.xlsx")?;

    Ok(())
}

I get empty fields for age and id.