nacnudus / tidyxl

Read untidy Excel files in R https://nacnudus.github.io/tidyxl/
https://nacnudus.github.io/tidyxl/
Other
246 stars 20 forks source link

Missing Row #48

Closed bradford61 closed 5 years ago

bradford61 commented 5 years ago

I'm having a similar issue to that of #43 which was closed. I have highly unstructured .xlsx files which I am trying to parse and a good percentage of them have missing cells in the xlsx_cells tibble. I have attached a stripped down file that reproduces the error. In fact none of row 17 is present in the xlsx_cells tibble. missingRow.xlsx

nacnudus commented 5 years ago

Hi @bradford61, the reason that they don't appear is because, believe it or not, they don't exist. Excel files don't store cells unless they contain a value, a formula, or some formatting. The cells in row 17 are completely blank, so they aren't stored. Row 17 itself has formatting -- a thick bottom border.

I found this out by using Jenny Bryan's package excelgesis and running xlgesis::xg_inspect("./missingRow.xlsx"). Here is part the part of the file xl/worksheets/sheet1.xml for rows 16, 17 and 18:

<row r="16" spans="1:13" s="42" customFormat="1" ht="12" x14ac:dyDescent="0.2">
<c r="A16" s="30"/>
<c r="B16" s="29"/>
<c r="C16" s="29"/>
<c r="D16" s="29"/>
<c r="E16" s="29"/>
<c r="F16" s="29"/>
<c r="G16" s="29"/>
<c r="H16" s="29"/>
<c r="I16" s="29"/>
<c r="J16" s="29"/>
<c r="K16" s="29"/>
</row>
<row r="17" spans="1:12" ht="19.5" customHeight="1" thickBot="1" x14ac:dyDescent="0.25"/>
<row r="18" spans="1:12" s="20" customFormat="1" ht="13.5" thickBot="1" x14ac:dyDescent="0.25">
<c r="A18" s="27"/>
<c r="B18" s="28"/>
<c r="C18" s="28"/>
<c r="D18" s="28"/>
<c r="E18" s="28"/>
<c r="F18" s="28"/>
<c r="G18" s="28"/>
<c r="H18" s="28"/>
<c r="I18" s="28"/>
<c r="J18" s="28"/>
<c r="K18" s="46"/>
</row>

If you need those cells to exist with placeholder values, look at ?tidyr::expand()

bradford61 commented 5 years ago

Weird. Thanks for your help!

gregrs-uk commented 2 years ago

Thanks again for this highly useful package. I wonder whether it would be worth changing the documentation for the xlsx_cells parameter include_blank_cells to make it clear that not all blank cells can necessarily be included. I personally found this a bit confusing.