@ianmoran11 Thanks again for this huge contribution. I have made it a pull request so that we can easily comment on the code.
This PR adds a new workflow for unpivoting.
Annotate cells as 'data'
Carry forward all the cells
Annotate cells as 'header'
Carry forward all the cells
Etc.
Unpivot in one go at the end.
Contrast this with the existing workflow:
Strip one layer of headers from an outside edge
Carry forward the remaining cells
Strip another layer of headers from the newly exposed edge
Carry forward the remaining cells
Etc.
The remaining cells are the 'data' cells
The new workflow allows for clever functions to guess what the annotations ought to be:
Automatically annotate cells as 'data' and 'header' with a clever function
Carry forward all the cells
Modify the annotations if necessary
Unpivot
The automatic annotation can spare the programmer from having to know in advance how many layers of headers there are. This is useful when a file contains many tabs, each of which has different numbers of layers of headers, but arranged in a similar hierarchy. The automatic annotator can recurse through as many layers of headers as necessary.
The annotations can be inspected graphically. This might be easier for users to debug.
Because you must often refer to formatting to identify sets of cells, a suite of functions is provided to extract particular formats into their own columns. This means the formats have to be available alongside the cells, hence there is an xlsx_cells_fmt() function, which stores the formats in an attribute. This was always the intention for tidyxl, but only became possible relatively recently when dplyr et al began to preserve attributes.
@ianmoran11 Thanks again for this huge contribution. I have made it a pull request so that we can easily comment on the code.
This PR adds a new workflow for unpivoting.
Contrast this with the existing workflow:
The new workflow allows for clever functions to guess what the annotations ought to be:
The automatic annotation can spare the programmer from having to know in advance how many layers of headers there are. This is useful when a file contains many tabs, each of which has different numbers of layers of headers, but arranged in a similar hierarchy. The automatic annotator can recurse through as many layers of headers as necessary.
The annotations can be inspected graphically. This might be easier for users to debug.
Because you must often refer to formatting to identify sets of cells, a suite of functions is provided to extract particular formats into their own columns. This means the formats have to be available alongside the cells, hence there is an
xlsx_cells_fmt()
function, which stores the formats in an attribute. This was always the intention fortidyxl
, but only became possible relatively recently whendplyr
et al began to preserve attributes.