jmcnamara / rust_xlsxwriter

A Rust library for creating Excel XLSX files.
https://crates.io/crates/rust_xlsxwriter
Apache License 2.0
316 stars 25 forks source link

`impl IntoExcelData for String` (and optionally `&String`) #35

Closed LeoniePhiline closed 1 year ago

LeoniePhiline commented 1 year ago

Feature Request

Examples:

                // Assume `path: PathBuf`
                worksheet.write(
                    0,
                    0,
                    path.display().to_string(),
                )?;

                worksheet.write(
                    0,
                    0,
                    format!("Something {formatted:#?}"),
                )?;

-> "the trait IntoExcelData is not implemented for std::string::String"

Must work around with &* (as impl IntoExcelData for &str is the only stringy type available), which is quite cumbersome.

https://docs.rs/rust_xlsxwriter/latest/rust_xlsxwriter/trait.IntoExcelData.html#foreign-impls

jmcnamara commented 1 year ago

There should be a IntoExcelData impl for &String. That was an omission and I'll fix that.

However, there aren't any write*() APIs (or other APIs) in rust_xlsxwriter that take String so I will leave that one out.

jmcnamara commented 1 year ago

Added to main.

LeoniePhiline commented 1 year ago

Awesome, thank you! 🤍

However, there aren't any write*() APIs (or other APIs) in rust_xlsxwriter that take String so I will leave that one out.

Would you accept PRs to add these?

A more user-friendly api could accept impl AsRef<str> and simply use .as_ref() internally.

This would cover pretty much all string types.

(IIRC you use references. Otherwise impl Into<String> would be more appropriate.)

jmcnamara commented 1 year ago

For APIs that take a string-like object I thought that &str would be the best option and there is a certain amount of consensus around that usage in Rust APIs.

However impl Into<String> is more generic in that it also allows String in addition to &str and &String. In addition, in almost all cases the rust_xlsxwriter library takes an owned copy of the string so impl Into<String> is a better fit for that case too.

@adriandelgado could I get a second opinion from you on the use of impl Into<String> versus &str in the APIs.

LeoniePhiline commented 1 year ago

For APIs that take a string-like object I thought that &str would be the best option and there is a certain amount of consensus around that usage in Rust APIs.

Only if you do not need to take ownership. Taking a &str and then cloning / copying the data in some way may create lots of unnecessary overhead.

However impl Into<String> is more generic in that it also allows String in addition to &str and &String.

This is not entirely accurate. Both AsRef<str> and Into<String> are equally generic and allow String, &str, &String as well as Cow<'_, str>.

The difference lies in if you are consuming the data, or only reading from borrowed slices:

If you must move data, then Into<String> is imperative, as it can avoid cloning, while AsRef<str> avoids cloning if you only read from borrowed slices but never move the string data.

jmcnamara commented 1 year ago

This is not entirely accurate.

I meant it was more generic than &str. Your other points I agree with.

jmcnamara commented 1 year ago

I've converted write_string() from &str to impl Into<String> on main and I've extended write() to support &str, &String, String, and Cow<'_, str>.

It is probably slightly slower (<1%) but within the measurable bounds so that is probably acceptable/ignorable for the added genericism.

LeoniePhiline commented 1 year ago

To reduce compile times due to monomorphization, you can additionally use this technique:

Make private store_string non-generic, so that it only accepts String.

In the generic functions write_string (_with_format and without), only call .into() on the generic argument. The same for IntoExcelData macro implementations.

Use the resulting String as argument for calling the non-generic store_string.

The latter, non-generic method then contains the business logic and requires no monomorphization.

The generic functions on the other hand remain minimal in size and thus fast to compile for multiple concrete types.

Edit: see comments added to https://github.com/jmcnamara/rust_xlsxwriter/commit/c318cdba3252c4327e0a36ac3ec7947c874e9ac6

jmcnamara commented 1 year ago

To reduce compile times due to monomorphization, you can additionally use this technique:

Thanks. Done.

jmcnamara commented 1 year ago

These changes have been rolled up into v0.37.0. Thanks for the input.

adriandelgado commented 1 year ago

Sorry I didn't respond, I was on vacation at that time and I just checked my notifications.

I think the suggestions from @LeoniePhiline are accurate. In general, if a function needs an owned string its better that the caller of the function is the one making the allocations. If a function takes only a &str it usually means the function doesn't need to allocate internally. In general, its idiomatic to be explicit about what the function needs.

Checkout the API guidelines: C-CALLER-CONTROL

jmcnamara commented 1 year ago

Thanks @adriandelgado and @LeoniePhiline for the input.

There are ~30 more APIs in rust_xlsxwriter that take &str references but convert the str to String almost immediately. I don't know if it is worth converting them to impl Into<String> as well. Most won't be used very commonly by end users. What do you think?

adriandelgado commented 1 year ago

If you were creating a new crate from scratch, then I would insist that you follow the Rust API guidelines to keep it consistent and idiomatic. However, the way it is right now doesn't seem like a big deal so its your call if you want to change it.

LeoniePhiline commented 1 year ago

There was considerable investment into this crate's performance.

The performance at the api boundaries might not have been part of these measurements and improvements.

I could imagine that there is a lot to gain by accepting allocated strings and preventing unnecessary clones.

Accepting Into<String> could allow strings to be passed while not breaking backwards compatibility with existing &str users.

I think this is a good idea, with focus on the APIs with the most bytes throughput in common crate usage.

LeoniePhiline commented 1 year ago

PS: People might be very willing to help. Just create “good first issue”s for the tasks. :)

jmcnamara commented 1 year ago

Let's continue this discussion in #16 "Proposal: generic write() method"