maelstrom-research / Rmonize

3 stars 0 forks source link

Suggestion: More explanation of output of dataset_summarize() #21

Open twey2 opened 11 months ago

twey2 commented 11 months ago

The output of dataset_summarize() has some confusing elements. The general issue is that there is a lot of different output with no explanation of contents. I'm not sure what the best solution is, but in the future, maybe even just a brief definition of each table, printed in the console would help. The screenshot is from "Data dictionary assessment", but the questions about messages in "Quality assessment comment" also apply to other parts of the report. Some specific examples are below.

image

In "Data dictionary assessment", it took me a while to understand the two "name_var" NA rows generated. This seems to be mixing levels of assessment (part of the assessment is on the whole column, part of the assessment is about individual values).

Unclear what column "value" means in "Data dictionary assessment".

Quality assessment comment "[INFO] - Incompatible variable names with usual standards, including Maelstrom":

Quality assessment comment "[INFO] - Possible duplicated columns": Not very informative. It would be clearer if there was a message like "All values identical to values in another column", or something that explains the potential issue.

GuiFabre commented 11 months ago

Thank you for your output 👍 These comments will be taken in account and be part of a huger batch of modifications for the all of them :

they are not perfect, but at least they all share the same structure. They need to be discussed as a whole to be consistent across madshapR and Rmonize