Closed lucatrv closed 6 months ago
It is therefore inconvenient to have to treat the first instance differently from all the others, for the only purpose to be able to serialize headers
I agree with this and I did spend a good bit of time looking at alternatives. In particular I wanted to do something like Mingun suggested in the RFC thread (#61) where the type could be specified like this:
worksheet.serialize_headers::<Produce>(0, 0)?;
worksheet.serialize_headers_with_format::<Produce>(0, 0, &format)?;
However, that would only work for Deserialize
traits since ser::serialize()
needs an actual instance and not just the type. serde_aux
uses the Deserialize
approach. I considered implementing something similar but that would require rust_xlsxwriter
serializable structs to derive both Deserialize
and Serialize
which seemed like it would be counterintuitive to the end user so in the end I didn't do that.
That may actually be moot in practice since users will probably be serializing structs that they deserialized from somewhere else and be using #[derive(Deserialize, Serialize)]
anyway. So maybe I should revisit that.
For my own usage I have been a temp default instance of the struct that I want to serialize (which is a small bit clunky but workable):
use calamine::{open_workbook, RangeDeserializerBuilder, Reader, Xlsx};
use rust_xlsxwriter::Workbook;
use serde::{Deserialize, Serialize};
#[derive(Deserialize, Serialize, Default)]
#[allow(dead_code)]
struct ExcelRow {
region: String,
value: f64,
}
fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut excel: Xlsx<_> = open_workbook("Book1.xlsx")?;
let range = excel.worksheet_range("Sheet1")?;
let iter = RangeDeserializerBuilder::new().from_range::<_, ExcelRow>(&range)?;
// Create a new Excel file object.
let mut workbook = Workbook::new();
let worksheet = workbook.add_worksheet();
// Set up the serialization headers.
let excel_row = ExcelRow::default();
worksheet.serialize_headers(0, 0, &excel_row)?;
// Serialize the data.
for result in iter {
let row = result?;
worksheet.serialize(&row)?;
}
// Save the file to disk.
workbook.save("Book2.xlsx")?;
Ok(())
}
@claudiofsr Any thoughts/comments on requiring a Deserialize
trait for rust_xlsxwriter
serializable structs? I presume it would work for your use case.
Yes I also used the default
approach but I confirm it is clunky, IMHO the Deserialize
trait will be derived / implemented anyway when Worksheet::serialize
is used. Otherwise, if you want to cover all possible cases, you could provide both approaches, the first one applicable when the Deserialize
trait is available, the second one when it is not available:
worksheet.serialize_headers::<Produce>(0, 0)?;
worksheet.serialize_headers_from_item(0, 0, &item1)?;
Otherwise, if you want to cover all possible cases, you could provide both approaches
Good suggestion. I'll look into that.
Thanks, I think this will benefit 99% of use cases. To add my 2 cents, the new method names could be either serialize_headers_from_item
, serialize_headers_from_instance
, serialize_headers_without_deserialize_trait
, etc.
I've pushed an initial version of this to main.
The WIP method names are:
worksheet.serialize_headers_from_type::<Produce>(0, 0)?;
worksheet.serialize_headers_with_format_from_type::<Produce>(0, 0, &format)?;
Here is a working example:
use rust_xlsxwriter::{Format, FormatBorder, Workbook, XlsxError};
use serde::{Deserialize, Serialize};
fn main() -> Result<(), XlsxError> {
let mut workbook = Workbook::new();
// Add a worksheet to the workbook.
let worksheet = workbook.add_worksheet();
// Set a header format.
let header_format = Format::new()
.set_bold()
.set_border(FormatBorder::Thin)
.set_background_color("C6EFCE");
// Create a deserializable/serializable test struct.
#[derive(Deserialize, Serialize)]
struct Produce {
fruit: &'static str,
cost: f64,
}
// Create some data instances.
let item1 = Produce {
fruit: "Peach",
cost: 1.05,
};
let item2 = Produce {
fruit: "Plum",
cost: 0.15,
};
let item3 = Produce {
fruit: "Pear",
cost: 0.75,
};
// Set the serialization location and headers.
worksheet.serialize_headers_with_format_from_type::<Produce>(1, 1, &header_format)?;
// Serialize the data.
worksheet.serialize(&item1)?;
worksheet.serialize(&item2)?;
worksheet.serialize(&item3)?;
// Save the file.
workbook.save("serialize.xlsx")?;
Ok(())
}
Output:
I'll finalize the names and functionality over the next few days.
I changed
worksheet.serialize_headers(0, 0, &T::default())?;
by
worksheet.serialize_headers_from_type::<T>(0, 0)?;
and it worked fine.
Another suggestion would be to choose to format the data structure like this:
use serde::{Serialize, Deserialize};
// Create a deserializable/serializable test struct.
#[derive(Deserialize, Serialize)]
struct Produce {
#[serde(rename = "Fruit FooBar")]
#[rust_xlsxwriter(set_min_width=8)]
fruit: &'static str,
#[serde(rename = "Value")]
#[rust_xlsxwriter(set_num_format="#,##0.00")] // 2 digits after the decimal point
cost: f64,
#[serde(rename = "Date DMY")]
#[rust_xlsxwriter(set_num_format="dd/mm/yyyy")]
dmy: Option<NaiveDate>,
#[serde(rename = "Date MDY",)]
#[rust_xlsxwriter(set_num_format="mm/dd/yyyy")]
mdy: NaiveDate,
#[serde(rename = "Long Description")]
#[rust_xlsxwriter(set_min_width=8, set_max_width=60)]
long_description: String
}
Or
use serde::{Serialize, Deserialize};
// Create a deserializable/serializable test struct.
#[derive(Deserialize, Serialize)]
#[rust_xlsxwriter(set_min_width=8, set_max_width=60, set_date_format="dd/mm/yyyy")]
struct Produce {
#[serde(rename = "Fruit FooBar")]
fruit: &'static str,
#[serde(rename = "Value A")]
#[rust_xlsxwriter(set_num_format="#,##0.00")] // 2 digits after the decimal point
cost_a: f64,
#[serde(rename = "Value B")]
#[rust_xlsxwriter(set_num_format="#,##0.0000")] // 4 digits after the decimal point
cost_b: f64,
#[serde(rename = "Date DMY A")]
dmy_a: Option<NaiveDate>,
#[serde(rename = "Date DMY B",)]
dmy_b: NaiveDate,
#[serde(rename = "Long Description")]
#[rust_xlsxwriter(set_background_color="C6EFCE")]
long_description: String
}
Or something else.
This way, we can specify the format of each column/field individually or globally and also choose the maximum and minimum widths to improve auto_fit().
The specific formatting of each column/field overrides the global formatting.
Just suggestions for implementations ("brainstorm")! I don't know if these implementations would be viable.
I am currently formatting each column using Regex and serde_aux: read_xml / excel
Another suggestion would be to choose to format the data structure like this:
use serde::{Serialize, Deserialize}; // Create a deserializable/serializable test struct. #[derive(Deserialize, Serialize)] struct Produce { #[serde(rename = "Fruit FooBar")] #[rust_xlsxwriter(set_min_width=8)] fruit: &'static str, #[serde(rename = "Value")] #[rust_xlsxwriter(set_num_format="#,##0.00")] // 2 digits after the decimal point cost: f64, #[serde(rename = "Date DMY")] #[rust_xlsxwriter(set_num_format="dd/mm/yyyy")] dmy: Option<NaiveDate>, #[serde(rename = "Date MDY",)] #[rust_xlsxwriter(set_num_format="mm/dd/yyyy")] mdy: NaiveDate, #[serde(rename = "Long Description")] #[rust_xlsxwriter(set_min_width=8, set_max_width=60)] long_description: String }
It would also be useful to specify a header format.
@lucatrv and @claudiofsr Thanks for the feedback.
I've pushed an updated (but still WIP) version to main. I've settled on the following method names:
pub fn serialize_headers<T>()
pub fn serialize_headers_with_format<T>()
pub fn serialize_headers_with_options<T>()
pub fn deserialize_headers<'de, T>()
pub fn deserialize_headers_with_format<'de, T>()
pub fn deserialize_headers_with_options<'de, T>()
The _with_options()
variants are the interfaces that I currently envisage for dealing with additional formatting. It works like this:
// Set custom field formatting.
let custom_headers = [CustomSerializeHeader::new("cost").set_cell_format(&money_format)];
let header_options = SerializeHeadersOptions::new()
.set_custom_headers(&custom_headers)
.set_header_format(&header_format);
worksheet.deserialize_headers_with_options::<Produce>(1, 1, &header_options)?;
Output:
However, I like the suggestion to encode this functionality in struct attributes. I'll look into that to see how practical it is to implement.
This way, we can specify the format of each column/field individually or globally and also choose the maximum and minimum widths to improve auto_fit().
That is becoming a requested feature for the Python library as well so I'll look into that in a future version.
Just suggestions for implementations ("brainstorm")!
That's fine. That is what I'm looking for. :-)
However, I like the suggestion to encode this functionality in struct attributes.
Being able to define field and header formats in a single place would be very convenient, and IMHO using the Serde struct attributes is a great idea. This way nothing else would be needed to serialize to xlsx
apart from defining the struct properly.
Being able to define field and header formats in a single place would be very convenient, and IMHO using the Serde struct attributes is a great idea.
I agree, and I spent the last few days looking into it. However, as far as I can see, it isn't feasible. There are at least two issues, the main ones being:
Passing parameters to serialize_with
functions isn't supported by Serde. I had envisaged something like this:
However, passing parameters like that was rejected in https://github.com/serde-rs/serde/issues/1059.
The suggested workaround in that thread wouldn't work for the the type of formatting/parameters required by rust_xlsxwriter
for the reasons pointed here: https://github.com/serde-rs/serde/issues/1059#issuecomment-1126593534.
Even if Format
parameters could be passed there isn't (as far as I can see) a way to pass them separately to the worksheet serializer. Most data conversion examples (like this one in the Serde docs) use a scheme of reformatting the data within the serialize
method. For example to take a date and reformat it to an RFC 3339 string and then use serializer.serialize_str()
to forward on the formatted string. However, Excel/rust_xlsxwriter
require both the data (unmodified) and the format to be stored.
So, unfortunately, I don't think the high degree of configurability within the struct attributes is currently feasible. I'm happy to be corrected since it would be a cleaner scheme.
For now I will continue with the de/serialize_headers_with_options()
approach shown above.
@jmcnamara here are my 2 cents on this, I hope this is feasible:
I would suggest to add to CustomSerializeHeader
a new method called set_column_format
which could be used to set the full column format before writing headers and values. Sometimes it is desirable to apply a common format to the full column, not only to cells with values. Of course this format would be applied before any other possible custom format defined for header or values. To make things clearer I would also rename set_cell_format
into set_value_format
. Then , formats could be applied either using set_column_format
, set_header_format
, or set_value_format
. With this addition, I think that practically all possible formatting needs could be satisfied by defining a CustomSerializeHeader
.
Then to achieve a similar result of what suggested by @claudiofsr, I guess that a macro could be defined to construct the proper CustomSerializeHeader
starting from custom struct attributes notation. This way, instead of first defining the Serde struct and then defining a CustomSerializeHeader
paying attention to use same headers names, one could define all needed options within the struct, and then use deserialize_headers_with_options
to apply the formats before starting to serialize. Note that I am not a Rust macro expert and I would need to verify whether this is feasible or not. I am not even sure if a macro would be needed, or otherwise if deserialize_headers_with_options
could get the custom struct attributes notation similarly to how it gets the header names.
- I would suggest to add to
CustomSerializeHeader
a new method calledset_column_format
which could be used to set the full column format before writing headers and values.
That is a good suggestion. I'll implement that. Note, I've already added a similar set_column_width()
method to CustomSerializeHeader
that mirrors the worksheet method. That change is on main.
- Then to achieve a similar result of what suggested by @claudiofsr, I guess that a macro could be defined to construct the proper
CustomSerializeHeader
starting from custom struct attributes notation.
That may be feasible. I was also considering implementing a trait (and helper attributes) that could be added to the #[derive()
. I plan to look at that to see if it is feasible (that may be after the next release though). That could be used by the de/serialize_header
methods to set up a CustomSerializeHeader
transparently to the user (in line with your suggestion).
This feature has been pushed to crates.io in rust_xlsxwriter
v0.60.0.
I will work on derive and attribute macros for (hopefully) the next release.
Feature Request
In almost all practical
Worksheet::serialize
use cases that I can think of, instances of the defined struct / enum are going to be created within a loop. It is therefore inconvenient to have to treat the first instance differently from all the others, for the only purpose to be able to serialize headers with one of theWorksheet::serialize_headers*
methods. A better approach would be to rely on theserde_aux::serde_introspection::serde_introspect
method to get the struct field names directly from the defined type. For this to workrust_xlsxwriter
would need to depend on theserde-aux
crate when theserde
feature is enabled.