roo-rb / roo

Roo provides an interface to spreadsheets of several sorts.
MIT License
2.79k stars 502 forks source link

Some formulas not being recognized and extracted #208

Open pmneve opened 9 years ago

pmneve commented 9 years ago

Am converting a couple xlsx workbooks to xml so the content can be versioned in git (also convert the other way using write-xlsx) I had to patch 1.13.2 to get capture of formulas to work correctly (wouldn't capture them at all) 2.0.0 captures the formulas but not all of them.

Here is xdiff3 compare of conversions of the same xlsx file using 1.13.2 (top) and 2.0.0. Red box shows missing formulas, green shows those captured by both.

2015-04-29 17_59_39-wf2_automation_references_679ad9e xml - kdiff3

Any thoughts? Need anything else?

stevendaniels commented 9 years ago

Can you create a gist for this issue (sample gist)? Here are some instructions for creating such a gist.

  1. Create a gist with code that creates the error.
  2. Clone the gist repo locally, add a stripped down version of the offending spreadsheet to the gist repo, and push the gist's changes master.
  3. Paste the gist url here.
pmneve commented 9 years ago

https://gist.github.com/pmneve/4f3cc8e3b3cfa9f454d2

trimmed xlsx: roo_formula.xlsx script: xlsx2xml2.rb xml output: roo_formula_679ad9e_roo2.xml first example of error is at line 4198

stevendaniels commented 9 years ago

Thanks, I'm looking into the issue. That gist was a bit heftier than I was hoping for.

stevendaniels commented 9 years ago

A look at your xlsx file's XML shows the following:

<row>
  ...
  <c r="D2" s="3">
    <f>COUNTIFS('Test Case References'!$A:$A,$A2,'Test Case References'!$F:$F,$AC$2,'Test Case References'!$M:$M,$AG$2)</f>
    <v>45</v>
  </c>
  <c r="E2" s="4">
    <f t="shared" ref="E2:E3" si="0">IF(B2&gt;0,(D2/B2)*100,0)</f>
    <v>78.94736842105263</v>
  </c>
  ...
</row>
<row>
  ...
  <c r="D3" s="3">
    <f>COUNTIFS('Test Case References'!$A:$A,$A3,'Test Case References'!$F:$F,$AC$2,'Test Case References'!$M:$M,$AG$2)</f>
    <v>47</v>
  </c>
  <c r="E3" s="4">
    <f t="shared" si="0"/>
    <v>94</v>
  </c>
  ...
</row>

E2/E3 are using a shared formula. Shared formulas are used to optimize load performance by sharing formulas. The "si" attribute is a reference to the cell with the master formula. If a cell inside the range of the shared formula doesn't have an si attribute or doesn't have the t="shared" attribute/value, then that cell's formula overrides the master formula (source).

So in your case, the master formula is being stored, but the cell's that share the formula aren't storing anything.

sheet.formula 2, 'e'
# => "IF(B2>0,(D2/B2)*100,0)"

sheet.formula 3, 'e'
# => ""

That's why your XML file is missing the formula on line 4198.

Roo will need to keep track of shared_formulas and calculate the formula for cells that refer to the master formula. Will need to make modifications to SheetDoc#cell_to_xml

      cell_formula = cell.content

      if cell.has_attribute?('t') && cell.attribute("t").value == 'shared'
        if cell.has_attribute('ref')
          ref_start, ref_end = cell.attribute('ref').value.split(':')
          @shared_formula[(ref_start..ref_end)] = cell_formula
        else
          master_formula = @shared_formula[(ref_start..ref_end)].select { |f| f.include? cell_xml['r'] }.first
          # find the formula and calculate it's value
        end
      end
      formula = cell_formula

I'll need to think about how a cell's formula should be calculated from the master formula.

pmneve commented 9 years ago

Steven, Thanks! Sorry about the gigantic gist. I have work around for now (either use 1.13.2 or merge the summary sheet from xml dump using 1.13.2 that developers send me). Appreciate your taking up the mantle for roo! Take care pat

pmneve commented 9 years ago

Steven, have you been able to look at this? Just checking....

stevendaniels commented 9 years ago

I know it's been a while, but I have been working on a some internal changes that will address this and other issues (like the type-casting issues).

pmneve commented 9 years ago

Not a problem!

pmneve commented 9 years ago

@stevendaniels Have been using 2.1.1 and some formulas (SUMIF, COUNTIF, and COUNTIFS) seem to work fine. But IF, as in "IF(B3>0,(D3/B3)*100,0)", doesn't make the cut...

Keep up the great work! pat

pmneve commented 9 years ago

(That gist I sent in May was down from 55,000 lines of xml with the full data)

stevendaniels commented 8 years ago

Thanks. I'll take another look and see what I can find.

pmneve commented 8 years ago

Still having the issue with 2.3.2. Am going to try your .to_xml method and see what happens there.

pmneve commented 7 years ago

Have left the project where I used the xml for version control of data repositories in Excel but still look forward to this enhancement. Once again, THANK YOU! for taking on Roo. You've done a superb job!