Closed KingAl closed 2 years ago
Thanks for the investigation! I've also looked into this and here are some additional notes
When copying from excel the html in the clipboard represents the table. When viewing the html as a file the table looks like in excel (except excel itself not wrap lines, i.e., >40 characters are still on one line even if the we have multiple columns).
Handsontable uses the xml/html parser of the browser to get the data out of the excel xml, something like tempElem.innerHTML = <excel html/xml>
and then uses .innerHtml
and .innerText
on the cell elements.
This means the browser is "to blame" for this behaviour (or the excel past behaviour)...
(as it turns out ASCII 194 is html
)
Handsontable in the latest version fixed this: https://github.com/handsontable/handsontable/blob/2dbc605c14b4f13eb25b77679f1c1cab98a274ad/handsontable/src/utils/parseTable.js#L158 It seems it is a bit more complicated but the code also handles other cases, e.g., multiple tbodies, headers, footers, ... it does not work with the currently used handsontable version 6.2.2 (or there has some more changed needed).
Luckily excel also puts plain text data into the clipboard which appears to be working for spaces and multi line values. So I think it would be best to completely disabled/skip html parsing and only use the plain text data.
Maybe you can have a look to for text/plain
clipboard data if it really would fix it?
Rows should be separated by \n
, new lines encoded as \n
, multi line values escaped with \" ... \"
Next week I'll check what OpenOffice & LibreOffice behave when copy & pasting. If they work well with the plain text in clipboard, I'll just use the plain text data...
Here are some of my findings while testing with Excel, LibreOffice & OpenOffice
text/plain is different for all of them….
Excel: adds uses quotes for cells containing new lines LibreOffice: adds a simple whitespace (as new line character) for cells containing new lines OpenOffice: just adds them as separate lines
So…. plain text is no solution.
I think text/html is the only way to go. However, I have to do more testing with new lines and multiple whitespace in a cell and how the different xml is produced by the programs.
At last the programs add a <meta name="generator" content="APP">
tag.
I ended up using most code from handsontable because it works and is tested... Also when users want to switch to handsontable, they get the same result.
I need to do some final testing but I thing it will be ready next week.
Thanks so much for looking into this!
Hmmm, maybe you have to try again with your excel version... I think they changed the excel copy/paste formatting (but I might be wrong).
Anyway, the update is live and it should be fixed now.
What OS?
Version?
Description
When a column of cells containing text are copied from Excel, some cells can unexpectedly wrap, with newlines inserted into the data.
Expected behavior
Pasted text data should be "the same" as copied text data.
Steps to reproduce
User Workarounds
For single-line textual data, the lines can be pasted into a text editor and recopied, to strip the HTML format from the Windows clipboard. Then data will be pasted into cells based on the newlines present in the textual data.
For multiline textual data, you can re-save your data into a CSV format, and then open that CSV with VSCode Edit CSV. Then, copying and pasting from one Edit CSV document into another works as expected.
Technical considerations & Notes
This appears to be related to the HTML format stored on the clipboard. Specifically, the HTML doesn't faithfully represent whitespace from the source string, as visible in the attached screenshot from NirSoft's InsideClipboard:
The 'incorrect' wrapping of the pasted data matches exactly the wrapping present on the HTML format clipboard data, treating all present whitespace as significant.
The HTML encoding seems to treat newlines in text fields as nonsignificant, and assumes that runs of whitespace should be treated as a single space. Interestingly, when there are multiple spaces in a row. a sequence of ASCII 194 characters (
'Â'
) representing each space are wrapped in a<span style='mso-spacerun:yes'>
span:This also pastes somewhat unusually in VSCode Edit CSV, showing up with four spaces (presumably the four literal spaces present within the span, rather than 'interpreting' the encoding.)
I don't yet know enough about WebView clipboard handling and how VSCode Edit CSV interacts with it to know how viable a fix is, but I thought it would be useful to document the issue. The "XML Spreadsheet" clipboard format in contrast uses a more sensible ~URLEncoded string with no nonsignificant whitespace and linefeeds encoded as
, but I imagine the WebView is making decisions about how VSCode Edit CSV should interact with the Windows clipboard.My use case for the Edit CSV plugin has specifically been editing CSVs without worrying about the mangling that would be caused by opening and saving from Excel, when I need to alter a file with data from another tabular source (like, as in this case, XLSX files.) As such, encountering this issue was a natural result of that style of usage.
The workarounds are manageable but the behaviour is surprising and others may run into it, so as above I thought it was worthwhile to document.