nacnudus / unpivotr

Unpivot complex and irregular data layouts in R
https://nacnudus.github.io/unpivotr/
Other
185 stars 19 forks source link

Annotate cells with structure for unpivoting #33

Closed nacnudus closed 4 years ago

nacnudus commented 4 years ago

@ianmoran11 Thanks again for this huge contribution. I have made it a pull request so that we can easily comment on the code.

This PR adds a new workflow for unpivoting.

  1. Annotate cells as 'data'
  2. Carry forward all the cells
  3. Annotate cells as 'header'
  4. Carry forward all the cells
  5. Etc.
  6. Unpivot in one go at the end.

Contrast this with the existing workflow:

  1. Strip one layer of headers from an outside edge
  2. Carry forward the remaining cells
  3. Strip another layer of headers from the newly exposed edge
  4. Carry forward the remaining cells
  5. Etc.
  6. The remaining cells are the 'data' cells

The new workflow allows for clever functions to guess what the annotations ought to be:

  1. Automatically annotate cells as 'data' and 'header' with a clever function
  2. Carry forward all the cells
  3. Modify the annotations if necessary
  4. Unpivot

The automatic annotation can spare the programmer from having to know in advance how many layers of headers there are. This is useful when a file contains many tabs, each of which has different numbers of layers of headers, but arranged in a similar hierarchy. The automatic annotator can recurse through as many layers of headers as necessary.

The annotations can be inspected graphically. This might be easier for users to debug.

Because you must often refer to formatting to identify sets of cells, a suite of functions is provided to extract particular formats into their own columns. This means the formats have to be available alongside the cells, hence there is an xlsx_cells_fmt() function, which stores the formats in an attribute. This was always the intention for tidyxl, but only became possible relatively recently when dplyr et al began to preserve attributes.

nacnudus commented 4 years ago

I've merged the latest changes in master to hopefully fix the build.