jmcnamara / rust_xlsxwriter

A Rust library for creating Excel XLSX files.
https://crates.io/crates/rust_xlsxwriter
Apache License 2.0
316 stars 25 forks source link

Proposal: generic write() method #16

Closed jmcnamara closed 1 year ago

jmcnamara commented 1 year ago

It would be possible to have a generic write() method using a trait that could call the type specific write_*() method such as write_number(). This is a feature in the Python version of the library.

I'll leave this here as a reminder to myself and to see if it gets an +1s for usefulness or desirability.

moutikabdessabour commented 1 year ago

Can I take a shot at this?

jmcnamara commented 1 year ago

@moutikabdessabour At this stage I'm mainly asking if people think this would be useful. Do you think it would be useful?

I'm currently 50/50 on this, for the reasons below, but I'm willing to take input.

Pros:

Cons:

moutikabdessabour commented 1 year ago

For the second con I think implementing a Formula/Url type would remove the overhead. I wrote an app using this library and while it is great the only issue I had is the verbosity of the function calls. I am a 100% for this feature

jmcnamara commented 1 year ago

I wrote an app using this library and while it is great the only issue I had is the verbosity of the function calls.

Yes, that is my feeling too. There are currently 24 write_*() methods. That is just too many.

It could be mapped down to 5 like this:

Current New
write_number() write()
write_string() write()
write_url() write()
write_boolean() write()
write_formula() write()
write_rich_string() write()
write_blank() write()
write_time() write()
write_date() write()
write_datetime() write()
write_dynamic_formula() write()
write_number_with_format() write_with_format()
write_string_with_format() write_with_format()
write_url_with_format() write_with_format()
write_boolean_with_format() write_with_format()
write_formula_with_format() write_with_format()
write_rich_string_with_format() write_with_format()
write_dynamic_formula_with_format() write_with_format()
write_array_formula() write_array_formula()
write_dynamic_array_formula() write_array_formula()
write_array_formula_with_format() write_array_formula_with_format()
write_dynamic_array_formula_with_format() write_array_formula_with_format()
write_url_with_options() Same
write_url_with_text() Same, or write_url_with_options()

With some caveats/additions:

Also a IntoExcelType (or similar) trait coupled with a generic write() would have the advantage of allowing mapping of other arbitrary user types in a clean way.

Now, that I've mapped it out like this it seems like the right thing to do (even if it is another major API break). To avoid breakage the type specific methods could stay but be undocumented/hidden. They would be needed for the trait mapping anyway.

I might add it after I finish the current work on charts.

moutikabdessabour commented 1 year ago

Gonna take a jab at this later today. WIll probably have something ready tomorrow night.

jmcnamara commented 1 year ago

Gonna take a jab at this later today. WIll probably have something ready tomorrow night.

There is no need. I will get to it myself after the current release.

jmcnamara commented 1 year ago

BTW that isn't meant to deter you from doing a prototype. I just want to flag that it will needs a lot of doc and example refactoring and I'd prefer to do that myself.

GlennMm commented 1 year ago

i would love the feature as it would enable me for example to write one function to write json data in array format (structured differently) with having to write many functions for a specific struct to write json to excel. If there is any way i can archive this can please show me how to and also you dim this feature as unnecessary then can you link me a rs lib that enable me todo so if its available.

jmcnamara commented 1 year ago

@GlennMm At this point I'm definitely going to add it. Either in the next release or a separate release just after.

GlennMm commented 1 year ago

@GlennMm At this point I'm definitely going to add it. Either in the next release or a separate release just after.

Thank you @jmcnamara that would awesome, looking forward to this feature

jmcnamara commented 1 year ago

I've added an initial version of this feature to main. It needs some additional docs and examples before it is ready for release.

It allows 2 new important features. The first is a generic write() method to simplify code:

use rust_xlsxwriter::{Workbook, XlsxError};

fn main() -> Result<(), XlsxError> {
    // Create a new Excel file object.
    let mut workbook = Workbook::new();

    // Add a worksheet to the workbook.
    let worksheet = workbook.add_worksheet();

    // Write a string to cell (0, 0) = A1.
    worksheet.write(0, 0, "Hello")?;

    // Write a number to cell (1, 0) = A2.
    worksheet.write(1, 0, 12345)?;

    // Save the file to disk.
    workbook.save("hello.xlsx")?;

    Ok(())
}

Output: screenshot

The second feature is that you can now use the same mechanism to extend write() for your own data types:

//! Example of how to extend the the rust_xlsxwriter write() method using the
//! IntoExcelData trait to handle arbitrary user data that can be mapped to one
//! of the main Excel data types.

use rust_xlsxwriter::*;

fn main() -> Result<(), XlsxError> {
    // Create a new Excel file object.
    let mut workbook = Workbook::new();

    // Add a worksheet to the workbook.
    let worksheet = workbook.add_worksheet();

    // Make the first column wider for clarity.
    worksheet.set_column_width(0, 12)?;

    // Write user defined type instances that implement the IntoExcelData trait.
    worksheet.write(0, 0, UnixTime::new(0))?;
    worksheet.write(1, 0, UnixTime::new(946598400))?;
    worksheet.write(2, 0, UnixTime::new(1672531200))?;

    // Save the file to disk.
    workbook.save("write_generic.xlsx")?;

    Ok(())
}

// For this example we create a simple struct type to represent a Unix time.
// This is the number of elapsed seconds since the epoch of January 1970 (UTC).
// See https://en.wikipedia.org/wiki/Unix_time. This type isn't handled by
// default by rust_xlsxwriter.
pub struct UnixTime {
    seconds: u64,
}

impl UnixTime {
    pub fn new(seconds: u64) -> UnixTime {
        UnixTime { seconds }
    }
}

// Implement the IntoExcelData trait to map our new UnixTime struct into an
// Excel type.
//
// The relevant Excel type is f64 which is used to store dates and times (along
// with a number format). The Unix 1970 epoch equates to a date/number of
// 25569.0. For Unix times beyond that we divide by the number of seconds in the
// day (24 * 60 * 60) to get the Excel serial date.
//
// We must also supply a number format if one isn't specified.
//
impl IntoExcelData for UnixTime {
    fn write(
        self,
        worksheet: &mut Worksheet,
        row: RowNum,
        col: ColNum,
    ) -> Result<&mut Worksheet, XlsxError> {
        // Create a default date format.
        let format = Format::new().set_num_format("yyyy-mm-dd");

        // Convert the Unix time to an Excel datetime.
        let datetime = 25569.0 + (self.seconds / (24 * 60 * 60)) as f64;

        // Write the date as a number with a format.
        worksheet.write_number_with_format(row, col, datetime, &format)
    }

    fn write_with_format<'a>(
        self,
        worksheet: &'a mut Worksheet,
        row: RowNum,
        col: ColNum,
        format: &'a Format,
    ) -> Result<&'a mut Worksheet, XlsxError> {
        // Convert the Unix time to an Excel datetime.
        let datetime = 25569.0 + (self.seconds / (24 * 60 * 60)) as f64;

        // Write the date with the user supplied format.
        worksheet.write_number_with_format(row, col, datetime, format)
    }
}

Output:

screenshot

Comments/reviews welcome.

jmcnamara commented 1 year ago

Also, if the Format was treated as an Option it would be possible to have just one method that needs to be implemented for the trait:

impl IntoExcelData for UnixTime {
    fn write<'a>(
        self,
        worksheet: &'a mut Worksheet,
        row: RowNum,
        col: ColNum,
        format: Option<&'a Format>,
    ) -> Result<&'a mut Worksheet, XlsxError> {
        // Convert the Unix time to an Excel datetime.
        let datetime = 25569.0 + (self.seconds / (24 * 60 * 60)) as f64;

        match format {
            Some(format) => {
                // Write the date with the user supplied format.
                worksheet.write_number_with_format(row, col, datetime, format)
            }
            None => {
                // Create a default date format.
                let format = Format::new().set_num_format("yyyy-mm-dd");
                worksheet.write_number_with_format(row, col, datetime, &format)
            }
        }
    }
}

This may, or may not, be more confusing to the end user. Internally it works out a lot cleaner. This code is on the generic_with_option branch for comparison.

jmcnamara commented 1 year ago

@Fight-For-Food Any thoughts on the interface or implementation of this feature? You had some good input on other generic/non-generic code.

Fight-For-Food commented 1 year ago

@Fight-For-Food Any thoughts on the interface or implementation of this feature? You had some good input on other generic/non-generic code.

I think that fn write(self, worksheet: &mut Worksheet, row: RowNum, col: ColNum) -> Result<&mut Worksheet, XlsxError>{....} is ok. It is convenient to use. And there is no need to make in more complex with extra-argument format: Option<&'a Format> For some more comprehensive thoughts i need more time. Maybe i will have some at this week. But can't make a promise about it.

jmcnamara commented 1 year ago

@Fight-For-Food

And there is no need to make in more complex with extra-argument format: Option<&'a Format>

It may not be clear from the snippet above but the addition of the Option<> means that there is only one trait funtion that needs to be implemented by the end user. However, it would probably be confusing and the version on main with IntoExcelData::write() and IntoExcelData::write_with_format() mimics the worksheet APIs may be simpler.

Maybe i will have some at this week. But can't make a promise about it.

Thanks. No problem if you can't get to it.

GlennMm commented 1 year ago

thank you for implementing this, it really help make my work much easier. looking to forward to be start contributing to this repo. so have started using this feature i haven't seen any problems with it so far

moutikabdessabour commented 1 year ago

@jmcnamara Looks great, You could allow the users to define only one function but get access to both write and write_with_format, the trait definition will have three functions the write, write_with_format and an internal function that would actually implement the logic with the format arg passed in as an option.

trait IntoExcelData {
    fn internal_write<'a>(
        self,
        worksheet: &'a mut Worksheet,
        row: RowNum,
        col: ColNum,
        format: Option<&'a Format>,
    ) -> Result<&'a mut Worksheet, XlsxError>;
    fn write(
        self,
        worksheet: &mut Worksheet,
        row: RowNum,
        col: ColNum,
    ) -> Result<&mut Worksheet, XlsxError>{
        self.internal_write(&mut worksheet, row, column, None)
    }

    fn write_with_format<'a>(
        self,
        worksheet: &'a mut Worksheet,
        row: RowNum,
        col: ColNum,
        format: &'a Format,
    ) -> Result<&'a mut Worksheet, XlsxError>{
        self.internal_write(&mut worksheet, row, column, Some(format))
    }

}
jmcnamara commented 1 year ago

Just a heads up that the initial version of this is upstream in crates.io in version 0.27.0: https://crates.io/crates/rust_xlsxwriter

I'll iterate and improve on this in the next few releases.

jmcnamara commented 1 year ago

Having taken a small detour into charts I've come back to complete this.

Some tasks, not in order:

jmcnamara commented 1 year ago

Implement generic write_row() and write_column() methods.

I've added a first pass at this to main. You can now write arrays of data types that implement IntoExcelData like this:

use rust_xlsxwriter::{Workbook, XlsxError};

fn main() -> Result<(), XlsxError> {
    // Create a new Excel file object.
    let mut workbook = Workbook::new();

    // Add a worksheet to the workbook.
    let worksheet = workbook.add_worksheet();

    // Some array data to write.
    let numbers = [1, 2, 3, 4, 5];
    let words = ["Hello"; 5];
    let matrix = [
        [10, 11, 12, 13, 14],
        [20, 21, 22, 23, 24],
        [30, 31, 32, 33, 34],
    ];

    // Write the array data as columns.
    worksheet.write_column(0, 0, numbers)?;
    worksheet.write_column(0, 1, words)?;

    // Write the array data as rows.
    worksheet.write_row(0, 3, numbers)?;
    worksheet.write_row(1, 3, words)?;

    // Write the matrix data as an array or rows and as an array of columns.
    worksheet.write_row_matrix(3, 3, matrix)?;
    worksheet.write_column_matrix(6, 0, matrix)?;

    // Save the file to disk.
    workbook.save("arrays.xlsx")?;

    Ok(())
}

Output:

screenshot

It is just some syntactic sugar around a loop but it is somewhat popular in the Python version.

It currently has an additional deference/copy and I'm not sure if that can be avoided. Feedback welcome on that or anything else.

adriandelgado commented 1 year ago

I gave some feedback on the last commit. I think that feedback applies on more parts of the codebase.

jmcnamara commented 1 year ago

@adriandelgado

I think the Copy trait bound could be avoided by using the IntoIterator trait.

I initially tried, and failed, to get that working. :-( With my morning brain and 2 cups of coffee I managed to understand/fix it. You can have a look.

I gave some feedback on the last commit. I think that feedback applies on more parts of the codebase.

I think I understood those comments but maybe not so let's check. What I think you are saying is that the API/library should avoid taking any references to data that will then be cloned and should instead use interfaces like Into<String> or IntoInterator. Is that correct or am I missing something else?

jmcnamara commented 1 year ago

I've also added write_row_matrix() and write_column_matrix() methods for 2D arrays/iterators of data. See the updated example above.

adriandelgado commented 1 year ago

I think I understood those comments but maybe not so let's check. What I think you are saying is that the API/library should avoid taking any references to data that will then be cloned and should instead use interfaces like Into<String> or IntoInterator. Is that correct or am I missing something else?

That's right. The caller of the function has the responsability to clone if needed. Sometimes the caller already has owned data and knows it wont need it anymore, no clone needed in that case.

jmcnamara commented 1 year ago

I've implemented a Formula type so that it can be used in the generic write():

    worksheet.write(0, 0, Formula::new("=1+2+3"))?;
    worksheet.write(1, 0, Formula::new("=SIN(PI())"))?;

However, I'm not sure how much of a help this is yet.

jmcnamara commented 1 year ago

I had hoped to be able to use Trait Objects and the newer array writing methods to simulate something like writing a heterogeneous vec of vectors like a dataframe.

Something like this:

    let data: Vec<Vec<Box<dyn IntoExcelData>>> = vec![
        vec![
            Box::new("East"),
            Box::new("West"),
            Box::new("North"),
            Box::new("South"),
        ],
        vec![
            Box::new(40),
            Box::new(40),
            Box::new(20),
            Box::new(30)],
        vec![
            Box::new(30),
            Box::new(30),
            Box::new(50),
            Box::new(30)],
        vec![
            Box::new(Formula::new("=B1+C1")),
            Box::new(Formula::new("=B2+C2")),
            Box::new(Formula::new("=B3+C3")),
            Box::new(Formula::new("=B4+C4")),
        ],
    ];

However, I've failed in every attempt so far. Examples like this with a simpler data structure:

use rust_xlsxwriter::{Formula, IntoExcelData, Workbook, XlsxError};

fn main() -> Result<(), XlsxError> {
    // Create a new Excel file object.
    let mut workbook = Workbook::new();

    // Add a worksheet to the workbook.
    let worksheet = workbook.add_worksheet();

    let data: Vec<Box<dyn IntoExcelData>> = vec![
        Box::new("East"),
        Box::new(50),
        Box::new(30),
        Box::new(Formula::new("=B1+C1")),
    ];

    /* The following doesn't work.

        --> examples/app_trait_array.rs:39:9
        |
     40 |         item.write(worksheet, row, 1);
        |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the size of `dyn IntoExcelData`
                                                cannot be statically determined

    */
    let mut row = 1;
    for item in data {
        item.write(worksheet, row, 1);
        row += 1;
    }

    // Save the file to disk.
    workbook.save("arrays.xlsx")?;

    Ok(())
}

Or this:

    let mut row = 1;
    for item in data {
        worksheet.write(row, 0, item)?;
        row += 1;
    }

Which gives errors like this:

   --> examples/app_trait_array3.rs:28:33
    |
28  |         worksheet.write(row, 0, item)?;
    |                   -----         ^^^^ the trait `IntoExcelData` is not implemented 
                                          for `Box<dyn IntoExcelData>`
    |                   |
    |                   required by a bound introduced by this call
    |
    = help: the following other types implement trait `IntoExcelData`:

Anyone see a way that trait objects for the IntoExcelData trait might be handled?

Update: I'm still interested in this if there is a solution (there may not be a clean solution) but I'm going to shift my focus to working with real Dataframes from Polars.

jmcnamara commented 1 year ago

See also this example of writing a Polars dataframe to Excel using rust_xlsxwriter: #39

jmcnamara commented 1 year ago

These generic methods and types are now available in release v0.39.0.