tafia / calamine

A pure Rust Excel/OpenDocument SpreadSheets file reader: rust on metal sheets
MIT License
1.6k stars 155 forks source link

Feature Request: Reading (and writing) formatting #404

Open MyTotoro opened 4 months ago

MyTotoro commented 4 months ago

It would be very helpful to read the formatting and also write formatting out to excel files

I have the code to get string values out! 6,0: Some(String("Revenue Build"))

Would be curious if there was a way to extract the formatting info:

image

I've done some browsing under the hood and it seems like formats are driven from read_styles in https://github.com/tafia/calamine/blob/master/src/xlsx/mod.rs#L221

Is the right thing to do to read and parse more styles, and expose this data upwards?

wpf375516041 commented 3 months ago

I understand that the code you're referring to pertains to some formatting properties of the table itself rather than individual cells. If we need to extend functionality for specific cell attributes, I am not well-versed in parsing XLSX files but would like to contribute. Can someone provide me with more detailed suggestions on how to proceed with extending cell functionality?

noctuid commented 2 months ago

My team works with large amounts of data in excel files. We report various errors about this data and need to be strict about types to avoid ambiguity. For example, we expect columns with percentage values and either require the inputs to be numbers in the excel percent format or strings ending in "%". Decimals like 0.2 are ambiguous (was this meant to be 20% or 0.2%?), so we report errors for cells like that. I listed some more examples here.

Currently we are using openpyxl, but it is unusable for larger files. Calamine, on the other hand, does not currently allow detecting issues like this. After reading with calamine, we get decimals and know nothing about the original type. With openpyxl, we can get not just a cell value but also the data type and number format (e.g. .data_type == "n" and "%" in cell.number_format to check for percentage-type cells).

It would be nice if calamine could also read/provide type/format information or if it could have more DataTypes so that we could distinguish percentage decimals from plain decimals.