update the doc Derivation Concepts

duniek commented 2 years ago

What are the guiding principles to split or combine functions together. This needs to be clear before open source contributors start challenging our design choices.

bundfussr commented 2 years ago

@Roche-GSK/admiral-devs , we have discussed the issue and decided that we want to create an overview of the existing functions and future derivations and group them into admiral derivation concepts. This should help to develop guiding principles for splitting and combining functions.

I have created a document (admiral Derivation Concepts) for this.

Please feel free to add to and comment on the document.

bms63 commented 2 years ago

I had a go at this. Basically just dumped all the functions at the bottom of the document and trying to work through each one. If I classify one, then I strike it out.

I got the list of all the functions from the Reference section for 0.6.0 release.

bms63 commented 2 years ago

@thomas-neitmann @rossfarrugia Do we still want to develop this further?

rossfarrugia commented 2 years ago

I've not been part of all these background discussions here but as you tagged me @bms63 i'll do my best to comment with my understanding. i think the history here comes from the fact we tried to start with the principle of making admiral as flat as possible so each function would create a dedicated variable/parameter. but then i think we got external testing feedback saying we had redundant or very similar functions that could be grouped. so now we're trying to find ways in which to justify when to group and when not to. correct me if any of this understanding is wrong!

in my mind i do think we're moving towards the idea of grouping common derivations around an "analysis concept", e.g. i totally understand why derive_vars_dy() means we could deprecate derive_var_ady(), derive_var_astdy() & derive_var_aendy(). the balance there though is that we don't want to make overly generic functions that are difficult to test and harder for the users to pick up and use. this all links to this related issue https://github.com/pharmaverse/admiral/issues/954 as i see it. as Stefan points out there we would need to be transparent on such analysis concepts to justify our decision making.

p.s. i do think wherever we decide for more general functions it makes our ADaM vignettes and templates even more critical for users, as it becomes that one step more difficult to find functions when they are not named exactly the same as the variable you are trying to derive.

Tagging @koegerr in case he wants to add any more.

bms63 commented 2 years ago

Thanks Ross. I have had this issue tagged to me for a while. I just wanted to make sure it was still relevant. This takes time to do, but hopefully we can make some more headway on it.

rossfarrugia commented 2 years ago

hi @bms63 , I’ve reflected on this one and I would suggest to help new developers and users understand our strategy, we should add an extra bullet to the Usage section of our homepage (so that it’s one of the first things people see) that says something like:

In our effort to create simple modular functions, we sometimes group similar derivations under one “analysis concept” that can be achieved by a single more generic function. Most commonly, this would be for ADaM variable conventions that use the same suffix. An example would be study day variables (---DY) where we have a derive_vars_**dy** function that can be used for all, instead of maintaining separate derive_var_ady, derive_var_astdy, etc. This is only done for highly similar derivations that does not result in overly generic functions with many different arguments that would become difficult to use. For example, ONTRTFL and TRTEMFL may both use the same suffix (---FL) but the derivations are sufficiently different to warrant their own functions.

Also I don’t think we need this in the above paragraph, but as a development team we have to consider findability of our functions as well. If we end up with lots of overly generic functions with inconsistent namings then people will struggle to find what they need, and will end up relying too heavily on the ADaM vignettes/templates. Maybe it justifies adding to https://pharmaverse.github.io/admiral/articles/programming_strategy.html#function-names-1 an extra naming convention as follows:

derivevardy – for grouped functions under one “analysis concept” use the same ADaM variable naming convention suffix, e.g. ---DY in this case

I'll start drafting something for #954 as well, as I see now how it's difficult for new developers to make these calls without a clear documented strategy for function design choices. I had added the Expectations section to the homepage before, but that's too high level I see.

bundfussr commented 2 years ago

@rossfarrugia , the --DY suffix is a nice example but I wonder if it can be generalized. As you have already pointed out it does not work for the --FL suffix. Do you have other examples where the suffix approach works?

Furthermore, variable derivations are just one half of the derivations. The other one are the parameter derivations. But CDISC does not define any standard parameter codes. Thus we can not use them to group functions.

I doubt that there is a quick solution. I think we need to analyses some hundreds derivations and try to group them based on the steps performed within the derivation. If the steps are similar, the derivations can be done by a single function like derive_vars_dy() or a class of similar functions like the derive_vars_merged_* functions. I started this in https://exbpbox.box.com/s/hxngtc941diiocghoaoez3r34salkq77. However, this will be a lot of work and take some time.

rossfarrugia commented 2 years ago

@bundfussr i've drafted something for #954 this morning and shared with @koegerr for a quick sanity check. if he agrees then will add to our box tomorrow for the core team to check over.

my personal thinking is we shouldn't go overboard with grouping our existing functions (beyond the obvious examples like DY) as it just makes it that one step harder for users to find what they're looking for - and generic functions (although less in number) are not always easy to maintain, test and use, especially for the more complex derivations. for your derive_varsmerged* I'm OK with including these as extra offerings as they open up flexibility to cover more company-specific needs, but i wouldn't like to see now all our more targeted functions start to get deprecated because of these, especially for the very common needed variables.

anyway, once we agree 954, we can go more into things like parameters and admiralonco will be a good exercise for this. we learn as we go!

pharmaverse / admiral

update the doc Derivation Concepts #684