insightsengineering / rtables

Reporting tables with R
https://insightsengineering.github.io/rtables/
Other
227 stars 49 forks source link

section_div in single var analyze call puts divider between each row generated by afun, rather than only after the last one #863

Open gmbecker opened 6 months ago

gmbecker commented 6 months ago
library(rtables)
lyt <- basic_table() %>% analyze("SEX", section_div = " ")

gives us

> build_table(lyt, ex_adsl)
                   all obs
——————————————————————————
F                    222  

M                    166  

U                     9   

UNDIFFERENTIATED      3   

Instead of what it should generate, which is:

                   all obs
——————————————————————————
F                    222  
M                    166  
U                     9   
UNDIFFERENTIATED      3   

Note when analyze is given multiple variables, we get the desired behavior:

lyt <- basic_table() %>% analyze(c("SEX", "STRATA1"), section_div = " ")

gives us

> build_table(lyt, ex_adsl)
                     all obs
————————————————————————————
SEX                         
  F                    222  
  M                    166  
  U                     9   
  UNDIFFERENTIATED      3   

STRATA1                     
  A                    122  
  B                    135  
  C                    143  

Multiple calls to analyze also result in the buggy behavior:

lyt  <- basic_table() %>% analyze("SEX", section_div = " ") %>% analyze("STRATA1")

gives

> build_table(lyt, ex_adsl)
                     all obs
————————————————————————————
SEX                         
  F                    222  

  M                    166  

  U                     9   

  UNDIFFERENTIATED      3   

STRATA1                     
  A                    122  
  B                    135  
  C                    143  

Its a little less clear what the correct behavior is here but only in terms of whether the second analyze call should inherit the section_div or not. Either way, section div should only place lines after sections.

I was going to prepare a patch but it appears the meaning of section divider has deviated drastically from its original usage so more discussion may be needed

Melkiades commented 6 months ago

Thank you, Gabe, for your observation. We have patched section_div behavior in https://github.com/insightsengineering/rtables/issues/762. Here you can find the refinement of its expected behavior with a user: https://github.com/insightsengineering/rtables/issues/835.

Hope everything is going well on your side! Please let me know if there is anything I can help you with

gmbecker commented 6 months ago

Hi @Melkiades @shajoezhu,

I'm not sure who originally put in the sometimes section_div in analyze means a divider between every row behavior. If it was me, it is a bug, not intended behavior, and I'd like to revert/fix it. If it wasn't me, please read below for my case why it should be changed and that behavior should be offered to uses in a different way.

First off, let me say that wanting to put dividers between rows generated by an afun is a good feature. That said, please consider reverting this behavior for the reasons below. After that I will lay out a couple of ways to give that ability to users which I think are better.

Case for changing the behavior

section_div in an analyze call behaving as it does after the linked PR is problematic in a number of ways:

  1. Simply put, individual rows are not sections (the new section_div<- function shares this problem, for the record). Thus section_div is no longer "doing what it says on the tin", which is bad from a UX/API design perspective.
  2. Passing the same argument to the same function (ie analyze( , section_div="+")) should not have radically different behavior depending on the values of other arguments, ie sometimes that means "+++++..." between each row and sometimes it means "+++++..." between blocks of (potentially) multiple rows as emitted by the afun(s).
  3. Users can't get the commonly desired (including by @iaugusty in #835) behavior in common layout patterns (ie multiple calls to analyze) without changing their layout defining patterns
  4. The current behavior is not documented anywhere that I can see.

Proposed new behavior

For these reasons I feel that analyze( , section_div = "+") should always mean putting the divider after the block generated by the(/each) afun. That is what was intended by the original design, and is the most intuitively understandable to the user (since, again, the argument name has "section" in it).

Feasibility

Davide you mentioned that getting the desired behavior requires analyze calls to know about eachother. Thats not entirely true if the meaning of section_div is fixed as proposed above. That said, analyze calls do "know about eachother" in that repeated analyze calls (without nested = FALSE) are combined at the layout stage into an AnalyzeMultiVars at https://github.com/insightsengineering/rtables/blob/b16e0be75f7d5c2dac98bd5e8c1ef4256b47b0cc/R/colby_constructors.R#L1118-L1119

So to the extent we need to unify the divider behavior we can. That said, I think it would be entirely reasonable to just let each analyze call control only the divider that directly follows the block of rows it creates, so if you have 3 analyze calls you'd need at least the first 2 of them to set section_div (the third would be unnecessary if the split they are within has section_div set...). This gives the user complete control over even the corner cases where they want section dividers between some but not all of the sibling analyze sections, at a mild cost to convenience.

I think it would also be reasonable to take the first non-NA section_div value from one of the individual analyze calls and propogate it up to the AnalyzeMultivars, with a warning if more than one of the analyze calls had different non-NA values for section_div. This gives the user less control (but only in a corner case that is "probably", famous last words, unlikely to come up in practice), but buys us back a bit of that convenience.

Row divider behavior

In as much as dividers between each row are desirable (again I'm not certain if that behavior is originally a bug by me that you assumed was intentional, or was intentionally implemented by you), I'd propose two ways of allowing the user to get that behavior:

  1. add a .trailing_divs argument to in_rows, so that afuns can define exactly which rows should have dividers after them and which shouldn't (even within an analyze block). This information would then be carried forward in exactly the same way as label information is during the row construction process within tabulation.
  2. If (1) is deemed insufficient for some reason, add a separate row_divider argument to analyze which sets it for each row (And is documented to be overridden by thesection_divider for the last row if set).

The one or both of above will retain the ability to set dividers between individual rows without having the problems I described above.

As an aside here, it is likely a good idea to change the name of the trailling_section_div slot on TableRow objects to something like trailing_div to clarify this ambiguity. I'm not sure if that slot even has exported accessors but even if it does, we can change the slot name and then alter the accessors (that ability being the entire point always going through accessors in the first place).

If we allow row dividers, we just have to make sure that for the last row, the row divider is overridden by the relevant section divider (probably even if said section_div value is even if it is NA) for the last row in an analyze section, which is inline with section_div behavior elsewhere, where the highest level section divider that applies is the only one that is used (e.g., if you split by SEX and STRATA1, and the row in question is the last row in both the male overall section and the male->strata b sections, the section div for SEX is used)

Let me know if you anything is unclear or if you want to discuss more. Happy to meet if thats easier.

gmbecker commented 5 months ago

ping @Melkiades @shajoezhu