trelliscope / trelliscopejs-lib

JavaScript viewer for Trelliscope displays
BSD 3-Clause "New" or "Revised" License
29 stars 7 forks source link

Robust handling of missing values #768

Closed hafen closed 1 year ago

hafen commented 1 year ago

There are a few issues with data that has missing values. I have created an example here: https://github.com/hafen/trelliscope-examples3/tree/main/gapminder_na

This is the gapminder example but I have added some random missing values in the country and continent variables.

The major bug is that when you load the display, open the filters sidebar, and click "continent", the app crashes.

If you want to see it right away before loading into your dev environment, you can go here.

Currently the way a dataset with missing is serialized to JSON is that there is that for a given row, if a variable is missing, it does not show up at all. e.g.: a row with complete data:

  {
    "country": "Afghanistan",
    "continent": "Asia",
    "mean_lifeexp": 37.3035555555556,
    "min_lifeexp": 28.801,
    "mean_gdp": 762.072263811111,
    "test": 2.88199615544245,
    "start_dt": "1951-12-29",
    "end_dt": "2001-12-23",
    "start_dttm": "1951-12-28T16:00:00",
    "end_dttm": "2001-12-22T16:00:00",
    "wiki_link": "https://en.wikipedia.org/wiki/Afghanistan",
    "latitude": 33.93911,
    "longitude": 67.709953,
    "__PANEL_KEY__": "Asia_Afghanistan"
  }

and a row where country is missing:

  {
    "continent": "Asia",
    "mean_lifeexp": 30.332,
    "min_lifeexp": 30.332,
    "mean_gdp": 820.8530296,
    "test": 2.91426540541726,
    "start_dt": "1957-01-06",
    "end_dt": "1956-12-26",
    "start_dttm": "1957-01-05T16:00:00",
    "end_dttm": "1956-12-25T16:00:00",
    "wiki_link": "https://en.wikipedia.org/wiki/NA",
    "__PANEL_KEY__": "Asia_NA"
  }

This appears to be okay with other examples I've made that have missing values. Here is another example of a trelliscope where there are a lot of missing values: https://github.com/hafen/trelliscope-examples3/tree/main/magic_trelliscopes2

If you want to see it right away before loading into your dev environment, you can go here.

If you run that one and open the filter for "toughness", for example, which has a lot of missing values, it works fine.

I think the difference is that all of the missing values in the second example are either numeric or string type. The ones in the first example are factor type.

A few other issues related to missing data:

The panel labels show up as blank. It would be nice for them to show up as "NA":

image

Also in the magic example, if you open a filter for "toughness" which has missing values and hover the NA, you see this hover text:

image

The text is "131/0" which isn't right.

jefferymills commented 1 year ago

I think in older code there may have been a replacement of empty values with something like "NA" or something. Might be worth looking at the old crossfilterMiddleware.