marcschwartz / WriteXLS

CRAN Package WriteXLS: Cross-platform Perl based R function to create Excel 2003 (XLS) and Excel 2007 (XLSX) files from one or more data frames. Each data frame will be written to a separate named worksheet in the Excel spreadsheet. The worksheet name will be the name of the data frame it contains or can be specified by the user.
GNU General Public License v2.0
19 stars 9 forks source link

Issue with "\n" inside characters #12

Closed carlganz closed 8 years ago

carlganz commented 8 years ago

Hello,

When data.frames contain character vectors that contain the element "\n" it breaks the table in excel instead of creating a cell with multiple rows of text.

library(WriteXLS)

x <- data.frame(stringsAsFactors=FALSE,
                v1 = 1:3,
                v2 = c("Hello", "This will \n break", "World"))

WriteXLS(x, "test.xlsx")

Thanks

marcschwartz commented 8 years ago

Hi Carl,

Thanks for your report.

This is a known issue. The problem being that the data frame is written out by R using write.table() to an intermediate CSV file. That CSV file is then parsed by the Perl script to create the Excel file.

In your example, you end up with the CSV file containing the following:

"v1","v2"
1,"Hello"
2,"This will 
 break"
3,"World"

such that the newline character actually creates a new line in the CSV file, which is parsed separately from the prior line, thus breaking the parsing by Perl when the row is created in the worksheet.

In order to be able to support multiline cells, I would have to add the ability to specifically format those cells to support text/line wrapping cell formats before writing the entry to the cell in Perl.

Since that functionality is outside of the scope of what WriteXLS is designed to do, I have no plans to add that capability.

It would also add a substantial amount of overhead to check cell by cell, and you would still have to pre-process the entries in your data frame, such that the newline character is pre-fixed by additional backslashes before writing that row to the CSV file, so that they are not treated as literals.

For example (note the doubling of the backslashes in the regular expressions used):

x$v2 <- gsub("\\n", "\\\\\\\\n", x$v2)

> x
  v1                    v2
1  1                 Hello
2  2 This will \\\\n break
3  3                 World

Now you end up with a CSV file that looks like this, when the double backslashes are parsed on output:

"v1","v2"
1,"Hello"
2,"This will \\n break"
3,"World"

and you then end up with an Excel file like this, after using WriteXLS:

v1  v2
1   Hello
2   This will \n break
3   World

but the second line is not wrapped within the cell.

Thus, for now, your options are to use gsub() to modify newline or other backslashed sequences in your source data frame to add the additional backslashes so that the characters are not output as literals in the intermediate CSV file, or use gsub() to remove them:

> x$v2 <- gsub("\\n", "", x$v2)

> x
  v1               v2
1  1            Hello
2  2 This will  break
3  3            World

Note that I left an additional space in the replacement between 'will' and 'break' above.

Hope that provides some insight into what is happening here.

If you need multiline cell support in your Excel files, you might want to look at the XLConnect package or similar, which provide more flexible cell formatting functionality.

Regards,

Marc

marcschwartz commented 4 years ago

Hi Carl,

I wanted to let you know that version 6.0.0 of the WriteXLS package was just released.

I was able to spend a fair amount of time recently, addressing this issue and related issues with embedded special character sequences. These are now better preserved in the Excel file that is generated and as an indirect consequence of the more robust handling, a multiline cell will be created where newlines are present.