marcschwartz / WriteXLS

CRAN Package WriteXLS: Cross-platform Perl based R function to create Excel 2003 (XLS) and Excel 2007 (XLSX) files from one or more data frames. Each data frame will be written to a separate named worksheet in the Excel spreadsheet. The worksheet name will be the name of the data frame it contains or can be specified by the user.
GNU General Public License v2.0
19 stars 9 forks source link

Special sequence generates error #15

Closed ferroao closed 5 years ago

ferroao commented 5 years ago

I came across an imported xlsx file, as dataframe, that showed the sequence: \r\r\n inside several cells (together with other text). In R it is shown without problem and also saving as .csv is normal.

When writing a .xls, this causes in several rows a loss of information - several columns completely lost * but in some rows the problem seems not to appear.-.

I solved it eliminating that string: df[] <- lapply(df, gsub, pattern = "\r\r\n", replacement = " ", fixed = TRUE)

marcschwartz commented 5 years ago

Hi,

This is essentially the same issue as reported in #12 and #13 previously.

More than likely, the original Excel file had embedded newlines and carriage returns in the cell content, and may have also had the text wrapping attribute set within the worksheet cell. The original Excel file could have contained content from yet another source, such as a database query, as opposed to manual data entry.

When writing out the content to a CSV file from R, the newlines ("\n") and carriage returns ("\r") get written as those characters creating new physical lines in the CSV file, not as escaped characters ("\r" and "\n") within the field. That results in truncated lines mid-field in the CSV file, which are then improperly parsed in the Perl code. See the second comment from me in #12.

The exact handling of newline and carriage return characters, strictly speaking, may be operating system dependent to an extent. See https://en.wikipedia.org/wiki/Newline for more information on that point.

The results may also be influenced by how you imported the Excel file into R (e.g. via a saved CSV file from Excel, or via some other CRAN package that supports reading Excel files directly) and how they interpret the cell content during saves and reads.

As noted in my previous replies, the solution for WriteXLS would require numerous steps, which are beyond the scope of what WriteXLS is intended to do.

Thus, pre-processing the data frame content, as you noted above, would be one approach. If you need more flexibility, XLConnect or one of the other Excel file reading/writing packages on CRAN may be more suitable for you.

Thanks!

ferroao commented 5 years ago

I understand, maybe something good for the user would be to get an alert of the type: Warning, some rows may have not been correctly parsed. I myself notice the problem just by chance. Sometimes you send the file to other person without imaging something could be missing.

marcschwartz commented 5 years ago

Hi,

Apologies for the delay in my reply, as this past week was crazy at work.

Your suggestion is entirely reasonable and I will consider options, likely on the R side of the workflow, for implementation to address this further.

marcschwartz commented 4 years ago

Hi Fernando,

I wanted to let you know that version 6.0.0 of the WriteXLS package was just released.

I was able to spend a fair amount of time recently, addressing this issue and related issues with embedded special character sequences. These are now better preserved in the Excel file that is generated and as an indirect consequence of the more robust handling, a multiline cell will be created where newlines are present.