maelstrom-research / tests-functions-dev

0 stars 0 forks source link

avant_apres_harmo() #1

Open GuiFabre opened 1 year ago

GuiFabre commented 1 year ago

testing of a function:

Run tests until consensus. Change life cycle if needed ('stable')

a-trottier commented 11 months ago

documents updated from test for CaG on Canpath server in folder #To test avant-apres# block on same variable I think image

GuiFabre commented 11 months ago

the line 83 in the DPE has an error. In input_variables, the variable SY09Bis_SQ01 does not exists. So the code fails.

update : the errors are catched, so the code can keep processing. It does not say what the error is, but passes through it.

image

a-trottier commented 11 months ago

(following discussion on variable that were "impossible" but that failed) This is resolved after updating to the version of last Friday. works well right now

a-trottier commented 7 months ago

batch 1 of improvements:

GuiFabre commented 7 months ago

@a-trottier

Adding the column -> OK

Separate by type of harmo_rule -> must be user end. Some might separate by dataset, other by dataschema variable, other by rule, or harmo status (to check the one that has failed). If you want to separate them : OPTION 1 : use avant_apres_output %>% group_split(harmo_rule) or filter your data processing element on a specific rule, or do a lapply, sapply, walk accross a grouped data processing elements. OPTION 2 : add an additional parameter split_by, which can be either NULL or dataset by default (because the input is a list of dataset. That allows flexibility. If option 2, we must decide if each of the tibble has the same structure or not.

In a nutshell, the tibble generated has column names which depend on the input provided. Additional columns are created after all of the variables are analysed. Hence, the output is different for direct_mapping variables only or case_when (involving more than one input variable). We must decide if the output is different for each group. The underlying function will be a

tbl %>% group_split(var) %>% lapply(function)
## each tibble is (possibly) different, with dedicated columns. bind_rows (possibly) does not work.

or if it is the same. The underlying function will be

tbl %>% function %>% group_split(var) 
## each tibble is the same, with (possibly) useless columns. bind_row() works.

Add label when available (or different from name actually) -> OK

a-trottier commented 7 months ago

Then we have another question pending: Does it makes sense to compare datasets with this tool?

When we talked about it first it was to compare datasets not dossiers. but it is a good question. To discuss