maelstrom-research / Rmonize

3 stars 0 forks source link

Histograms for date variables display invalid ranges #37

Open twey2 opened 8 months ago

twey2 commented 8 months ago

In visual reports, histograms for date variable display invalid ranges that don't match the Summary statistics or Span date representation. Two examples below. In the first, N3RO shows dates that are too early and too late (and MOBYDIck has dates too early). In the second, MOBYDIck has the same issue.

image

image

GuiFabre commented 7 months ago

Hello @twey2,

The date graphs have some issues that will be adressed and corrected in the next version. Temporarly, assuming that the first graph (of span) works, it will be maintained, meanwhile the second (histograms) will be replaced by a whisker plot. Even tho this solution is a bit redondant (span is include in both graphs), It'll allow to have 2 functionning graphs, with breaking the code.

This solution is temporary.

image image

twey2 commented 7 months ago

The function in the updated package now produces the whisker plot, and it looks accurate to me.

I don't completely understand the issue with having 2 plots, but if required and we can't make a histogram, why not have a whisker plot and a pie chart with % valid/% missing like for some other variable types?

I think the histogram would still be better than the boxplot if we could just change the units to something smaller (weeks or days instead of year) and/or find a way to pick a more logical number of bins to display based.