Who is responsible for data quality? App dev or app user?

insightsengineering / teal.osprey

Community efforts to collect teal modules for TLGs defined in the osprey package

https://insightsengineering.github.io/teal.osprey/

Other

5 stars 2 forks source link

Who is responsible for data quality? App dev or app user? #198

Open chlebowa opened 2 years ago

chlebowa commented 2 years ago

These validation statements in tm_g_ae_oview prevent the plot from being displayed.

validate(need(is.factor(ANL[[input$arm_var]]), "Selected arm variable must be a factor." ))
validate(need(nlevels(ANL[[input$arm_var]]) > 1, "Arm needs to have at least 2 levels"))
validate_has_data(ANL, min_nrow = 10)

An app user may not even know what a factor or a level is. They don't know the data types of variables (at least in the example app). If the app doesn't work because there is not enough rows in ANL, they can't change anything to obtain their desired visualization.

Are these safeguards not the responsibility of the app/module developer?

BLAZEWIM commented 2 years ago

I would say that in this particular case:

validate(need(is.factor(ANL[[input$arm_var]]), "Selected arm variable must be a factor." )) - agree, this depends on an app developer + the info could be more user-friendly

validate(need(nlevels(ANL[[input$arm_var]]) > 1, "Arm needs to have at least 2 levels")) - ok, someone could filter out one level and be surprised why the output is not appearing

validate_has_data(ANL, min_nrow = 10) - again one might filter out too much, but why 10 is minimum, not 1?

So while introducing shinyvalidate, we could do a re-evaluation why some tests are needed in the first place. On the other hand, some of them might have some history that we are not aware of, so some supervision on the process would be needed. Anyway, we do like reducing the number of lines of code.

gogonzo commented 1 year ago

This is a good issue. I thought I know the answer and I was sure of my opinion but not anymore. Since module can be included in any app, where user could have a control over various thing (imagine something like a teal). Then nobody can be sure what kind of data will be passed reactively to the module. What we can assume:

App developer knows the data
App developer knows that in teal, user might use filter-panel.
Filter panel filters rows only, doesn't do any changes in column classes not it doesn't drop factor levels.
Unique character/numeric values might be different than expectations of the app-developer as app-user might use any filter.
What about other than teal "frameworks"?

Considering above, I'm not necessary sure whether we can use stopifnot() in the tlg-modules or should we stick to the validate always.

I'd like to continue this discusssion