tidyverse / tidyr

Tidy Messy Data
https://tidyr.tidyverse.org/
Other
1.38k stars 417 forks source link

unnesting list columns with NULL values segfaults #141

Closed momeara closed 8 years ago

momeara commented 8 years ago

I am trying to read JSON data into R that is an array of rows having a list column that has missing values.1 For example,

{
    "table1" : [{
        "col1" : "val1a"

    }, {
        "col1" : "val1b",
        "col2" : ["val2b", "val2bb"]
    }]
}

I can use the jsonlite to parse the data,

library(jsonlite)
library(dplyr)
library(tidyr)

raw_json_data <- "{\"table1\":[{\"col1\" : \"val1a\"},{\"col1\":\"val1b\",\"col2\":[\"val2b\", \"val2bb\"]}]}" 
fromJSON()
data <- fromJSON(raw_json_data)

However, col2 has a NULL value in it rather than a NA as I would have expected.

> data$table1
    col1          col2
 1 val1a          NULL
 2 val1b val2b, val2bb

The NULL value is a problem, because when trying to unpack col2 using tidyr

data %>% unnest(col2)

it results in a segmentation fault. I think fromJSON should have an NA value rather than NULL value here, because then it's possible to use element-wise functions on the columns. But, none-the-less, I don't think tidyr should seg-fault when presented with NULLs.

I've cross posted this bug to jsonlite, jeroenooms/jsonlite#120


1E.g. the uniprot_ids column from genenames.org has missing values for some genes.

jeroen commented 8 years ago

This is a bug in tidyr. List columns can contain NULL values and tidyr seems to choke on that:


data <- data.frame(x = 1:2)
data$y <- list(NULL, c("foo", "bar"))
tidyr::unnest(data, y)
hadley commented 8 years ago

It's actually a dplyr bug: https://github.com/hadley/dplyr/issues/1605