Show datatypes of imported data.

john-harrold commented 1 year ago

On import, is there a way to confirm that a column imported with the expected class (numeric, character, date, etc.)? I don’t see it, and it brings in many small issue reports over time. As long as it is handled at some point (either in the visualize or the NCA module), it should be okay, but it could cause some headaches to go back and fix it. @billdenney

john-harrold commented 1 year ago

Tentative solutions: Try to add hover over on column headers to include information like data type, factor etc. Include:

Text
Numeric
Date
Other

Another or complementary solution is to change the background colors of headers or columns of data to indicate data type to the user.

john-harrold commented 1 year ago

Formatting headers:

https://stackoverflow.com/questions/58250873/how-to-display-specific-column-header-in-rhandsontable-in-a-particular-color-usi

https://stackoverflow.com/questions/43874347/how-to-make-column-heading-letters-bold-in-rhandsontable-in-shiny

Tooltips with headers: https://gist.github.com/timelyportfolio/b8001318ce3e25b6920a0f20e9db374e

john-harrold commented 1 year ago

Hey @billdenney. How does this look:

It's configured in the yaml file below under labels. if df is your data frame then you put whatever typeof(df$colname) returns under the data_types below. So defaults can be created in the package and then can be customized by the user if needed.

labels: 
    data_types:
      character:   
        color: "green"
        label: "text"
      double:      
        color: "blue"
        label: "num"
      other:       
        color: "black"
        label: "other"

I can also decrease the font size of the datatype to make it less obtrusive.

billdenney commented 1 year ago

I like it with the decreased font size (as long as it remains readable for people who need larger fonts). Perhaps the relative font size could be a yaml option. 😉

john-harrold commented 1 year ago

Here I just pulled out the entire column format into the yaml file. I'll incorporate it into both the upload data module and the data wrangling module.

Do you have any thoughts on default colors? Blue and green are probably not good given color blindness and all that.

    # This controls the overall format of headers for data files with
    # the following placeholders surrouned by ===:
    # COLOR  - font color
    # NAME   - colum name
    # LABEL  - type label
    data_header:  "<span style='color:===COLOR==='><b>===NAME===</b><br/><font size='-3'>===LABEL===</font></span>"

john-harrold commented 1 year ago

Following on along this path. Do you think it would be helpful to have the same information in the plot aesthetic controls. For example in the selection box below it's possible to change the colors and have the type information in a smaller font to the right.

Visually it would look something like this but with colors:

billdenney commented 1 year ago

For the palette, this page suggests blue/red as the pair to use. Maybe add black as the "other" group, too. (https://www.datylon.com/blog/data-visualization-for-colorblind-readers#color-blind-palette)

For the graphs, I would lean toward making color selection and advanced setting that applies to everything rather than a per-graph option.

john-harrold commented 1 year ago

What I was talking about was just the formatting of the form elements. Should those form elements that select columns from the dataset also reflect the data type information in the same way as the column headers in the data preview.

john-harrold commented 1 year ago

By the way this is what it looks like now:

I've overlaid the colorblind view using Sim Daltonism. I'm trying to reuse blues and reds from the buttons/pulldowns. I think it looks pretty good now.

billdenney commented 1 year ago

I don't think it's necessary to have the data type in the pulldown, but I'm not sure about the opinion. (I.e. maybe I'd change my mind with more thought.)

I like the look.

john-harrold commented 1 year ago

I'm thinking something like this when mapping columns to aesthetics in the figure generation module. My thought is that folks won't remember type information from the preview in the data wrangling module. So I can show them when they are selecting the columns:

This is more to give you an idea of what can be included. I'm looking for something like: This shows the user the type of data and for certain types of data (numeric here) it can show you some information about it (min/max).

I'm not sure if I should do anything with text data.

billdenney commented 1 year ago

I see use for that.

For text, I would only show unique values if there were just a couple of them. Otherwise, just saying that it's text should be good enough, to me.

john-harrold commented 1 year ago

Ok this is looking good. In the config file you can do something like this:

subtext:      "===LABEL===: ===RANGE==="

The LABEL is the numeric, text or whatever you want to use. For range I take the sorted unique values of the column and if there are more than 3 I replace it with LOWER, .... HIGHER. If there are <= 3 then I just join those with ", ".

I think I'm happy with this.

billdenney commented 1 year ago

That looks good to me!

john-harrold commented 1 year ago

This is done and applied to the headers of the preview tables and the subtext in the column selects in both formods (https://github.com/john-harrold/formods/commit/4220940faf07548cb75078dbe1fa4ae9ed370db0) and ruminate (https://github.com/john-harrold/ruminate/commit/f7cb3a6f7d4aad4b1c1a3cfa214e357289e67a59).

john-harrold / formods

Show datatypes of imported data. #17