ycphs / openxlsx

openxlsx - a fast way to read and write complex xslx files
https://ycphs.github.io/openxlsx/
Other
223 stars 74 forks source link

writeFormula with HYPERLINK creates unparseable XML: read.xlsx doesn't read "Text to display" from a file created by openxlsx #335

Closed michael-m52 closed 1 year ago

michael-m52 commented 2 years ago

Describe the bug This happens when read.xslx encounters a HYPERLINK formula. If the xlsx being read is a normal Excel file, the corresponding cell in the data.frame has the "friendly name" imported. If the xlsx being read was created with openxlsx, however, nothing is imported.

To Reproduce This will outline how to produce 3 test files: 1. an openxslx produced file; 2. a copy of that file saved by Excel application; 3. a new Excel file with the same HYPERLINK formulas as file 1.

  1. Refer to the documentation for makeHyperlinkString and run the code under Examples to make "internalHyperlinks.xlsx".
  2. Open the file "internalHyperlinks.xlsx" in Excel, then File > Save A Copy, name the new file "internalHyperlinks-Save-A-Copy.xlsx"
  3. Create a new Excel file. Manually add tabs named "Sheet1", "Sheet2", "Sheet 3". Copy the contents of "Sheet1" from "internalHyperlinks.xlsx" into this new file (select Sheet1, Ctrl-A, Ctrl-C then in new file Sheet1 Ctrl-V). Save file as "internalHyperlinks-New-File.xlsx"
  4. We now have 3 files to test. Run the following code to show how read.xlsx behaves:
file1 = 'internalHyperlinks.xlsx'
file2 = 'internalHyperlinks-Save-A-Copy.xlsx'
file3 = 'internalHyperlinks-New-File.xlsx'

table1 = read.xlsx(file1,1,colNames=F,skipEmptyRows=F,skipEmptyCols=F)
table2 = read.xlsx(file2,1,colNames=F,skipEmptyRows=F,skipEmptyCols=F)
table3 = read.xlsx(file3,1,colNames=F,skipEmptyRows=F,skipEmptyCols=F)

print("table1")
print(table1)
print("table2")
print(table2)
print("table3")
print(table3)

Expected behavior table1 should have same contents as table2 and table 3

Screenshots image

Example files internalHyperlinks.xlsx internalHyperlinks-Save-A-Copy.xlsx internalHyperlinks-New-File.xlsx

Additional context This looks like an issue with the original XML being written by writeFormula and saveWorkbook, rather than an issue with read.xlsx parsing the XML. Also, when internalHyperlinks.xlsx is opened, there is a warning about the XML being corrupted:

image image

JanMarvin commented 2 years ago

Hello @michael-m52 , thanks for the example. I can confirm the problem. Actually there are two issues, one of which I have fixed and the other I probably won't fix. The fixable thing: Our hyperlink creation was incorrect. This has been fixed. The other problem should be fixable, but I'm not going to try to do it.

The problem is related to https://github.com/ycphs/openxlsx/issues/188. The hyperlink is a formula, and when we write formulas, we don't usually evaluate them. Therefore, we know the f mark of a cell, but not the v mark. (We can write some weird Excel formula, but unless we implement a library to evaluate the formula, we don't know the value of the cell.) There is only one small difference: for hyperlinks we know the v-tag:

<c r="A1" s="1" t="str">
  <f>HYPERLINK("#'Sheet 3'!B1")</f>
  <v>#'Sheet 3'!B1</v>
</c>
<c r="C1" s="1" t="str">
  <f>HYPERLINK("#Sheet2!B3", "Text to Display - Link to Sheet2")</f>
  <v>Text to Display - Link to Sheet2</v>
</c>

Therefore, we read the file correctly, we just have the nasty mentality of skipping empty cells and not returning formulas. To fix this, you have to tell the function that writes the f tag that there is another v tag. Unfortunately for you, this is too much effort for me to fix the error for too little result.

JanMarvin commented 2 years ago

Just realised, there might be a rather simple way. Somehow openxlsx reads the names from the hyperlink class object and uses this as v tag. We could simply use this in makeHyperlinkString. It should be rather straight forward, let me know if you want to give it a shot.

michael-m52 commented 2 years ago

Hi @JanMarvin, thanks so much for you efforts. So your suggestion means that we would pass the display text into the hyperlink class object as "name" when creating it with makeHyperlinkString? That sounds feasible. If it's not too much trouble, this is worth a shot.

Otherwise, with your 1st fix in place, would read.xlsx just display the hyperlink formula? (i.e. the f tag)

michael-m52 commented 2 years ago

Hi @JanMarvin, on further reflection, I am wondering what the Excel application does to those hyperlinks when you "Save A Copy", because after this step, the hyperlinks are perfectly readable by read.xlsx. This implies that (1) the original workbook saved by openxlsx has all the necessary info already (2) "Save A Copy" seems to make the hyperlink compliant (3) read.xlsx is then capable of reading this info, not sure if this is a v-tag or not. I'll unzip all three samples that I attached to the original issue and see if there is any change in the f-tag and v-tag.

JanMarvin commented 2 years ago

Hi @JanMarvin, thanks so much for you efforts. So your suggestion means that we would pass the display text into the hyperlink class object as "name" when creating it with makeHyperlinkString? That sounds feasible. If it's not too much trouble, this is worth a shot.

Haha, sorry, I was just trying to push you into development. If you want to, you could give it a try. Sorry, I'm busy with other things at the moment and like I've said, we have similar well known (wontfix) issues with every other formula in the workbook. The single case where it behaves differently, was just the exception to the rule.

Otherwise, with your 1st fix in place, would read.xlsx just display the hyperlink formula? (i.e. the f tag)

Unfortunately it wont. There is ongoing development behind the scenes (to be released soon), where I have implemented an option to show the formulas (similar to Excel), but it will not switch dynamically. The logic for openxlsx is differently and I am not going to try to add such a thing.

michael-m52 commented 2 years ago

Thanks @JanMarvin I understand these are not high priority issues and I'll try fixing some things myself. I'm not confident with GitHub but I'll attempt a pull request if I am happy with my bug fix.

JanMarvin commented 2 years ago

This implies that (1) the original workbook saved by openxlsx has all the necessary info already (2) "Save A Copy" seems to make the hyperlink compliant (3) read.xlsx is then capable of reading this info, not sure if this is a v-tag or not.

To answer your question, it's late at night so I'll do it quick: In the hyperlink case Excel takes parts of the formula and makes this the value. If you look at https://github.com/ycphs/openxlsx/issues/335#issuecomment-1065402201 this is how Excel evaluates the hyperlink. When the file is written by opexlsx, there is no <v> node. We write only the <f> node. Excel creates the <v> node after evaluation of the formula.

This is not different to every other formula inside xlsx sheets. If you write a formula SUM(A1:B1) with writeFormula, only the <f> node is written. Once the file is opened by Excel the formula will be evaluated and the <v> node is created. Evaluation of the formulas is what we need Excel for. Like I've written before in other issues: xlsx files are simply xml files in a zip folder. There is no magic evaluation of formulas. For the evaluation you need a library, like Excel or Libreoffice.

michael-m52 commented 2 years ago

Thanks @JanMarvin

Is there a way to create an internal hyperlink using openxlsx without using the HYPERLINK formula?

In VBA it would look like this: ActiveSheet.Hyperlinks.Add Anchor:=Selection, Address:="", SubAddress:= "Sheet2!A1", TextToDisplay:="First cell of sheet 2"

JanMarvin commented 2 years ago

There are multiple examples in the hyperlink help page for various links. You could simply write it out with writeFormula.

michael-m52 commented 2 years ago

Thankyou, I have a workaround now that uses writeData instead.

JanMarvin commented 2 years ago

Glad that you've got something working! The issue will remain open until we merge development with master

sch56 commented 2 years ago

Not sure whether my issue is directly related to this, or maybe some by-product. A very simple hyperlink, linking to a cell on the same sheet appears very strangely in Excel and doesn't give a usable hyperlink.

library(openxlsx)
wb <- createWorkbook()
addWorksheet(wb, "HL")
hls <- makeHyperlinkString("HL", row = 1, col = 2, text = "Cell in next column")
writeFormula(wb, "HL", hls, startCol = 1, startRow = 5)
saveWorkbook(wb, "TestHL.xlsx", overwrite = TRUE, returnValue = TRUE)

image TestHL.xlsx

JanMarvin commented 2 years ago

Hi @sch56 , could you please check if the issue remains in our development branch? I've pushed a commit that is related a while ago https://github.com/ycphs/openxlsx/commit/be9afdeba25bf335064bc7a8186248cf71ecb2b2

sch56 commented 2 years ago

Hi Jan

Thanks for the response.

Yes, I have downloaded the development branch and is has fixed the problem.

Would be great to hear when this is available through CRAN so that I can deploy to production users.

Regards Simon Harrison

From: Jan Marvin Garbuszus @.> Sent: Friday, 10 June 2022 4:36 PM To: ycphs/openxlsx @.> Cc: Simon Harrison @.>; Mention @.> Subject: Re: [ycphs/openxlsx] writeFormula with HYPERLINK creates unparseable XML: read.xlsx doesn't read "Text to display" from a file created by openxlsx (Issue #335)

Hi @sch56https://github.com/sch56 , could you please check if the issue remains in our development branch? I've pushed a commit that is related a while ago be9afdehttps://github.com/ycphs/openxlsx/commit/be9afdeba25bf335064bc7a8186248cf71ecb2b2

— Reply to this email directly, view it on GitHubhttps://github.com/ycphs/openxlsx/issues/335#issuecomment-1151929574, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD367HH62LT4QGIRDANUKOTVOLA37ANCNFSM5QLRKTTQ. You are receiving this because you were mentioned.Message ID: @.**@.>>

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 365 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 1 year ago

This issue was closed because it has been stalled for 7 days with no activity.