Closed ppapasani1-rms closed 1 year ago
I have a spark data frame that has list(values) in each row. Is there a way to flatten the data by converting list of values to rows using sparklyr? Here is the sample data.
id colA colB colC 1: 1 list<> 4b,8b,2b list<> 2: 2 list<> 7b,2b,2b list<>
My output should look like this:
id colA colB colC 1: 1 1 4b FALSE 2: 1 2 8b FALSE 3: 1 3 2b FALSE 4: 2 1 7b FALSE 5: 2 2 2b FALSE 6: 2 3 2b FALSE
My data is of medium size(around 200M records). So I don't want to collect the data into R memory to perform this operation.
Here is the reproducible dataset
data.table(structure(list(id = list(1,2), colA = list(list(1, 2, 3),list(1, 2, 3)), colB = list(as.raw(c(0x4b, 0x8b, 0x2b)),as.raw(c(0x7b, 0x2b, 0x2b))), colC = list(list(FALSE, FALSE, FALSE),list(FALSE, FALSE, FALSE)) ), .Names = c("id", "colA", "colB", "colC"), row.names = c(NA, -1L), class = c("data.frame","data.table")))
I have a spark data frame that has list(values) in each row. Is there a way to flatten the data by converting list of values to rows using sparklyr? Here is the sample data.
My output should look like this:
My data is of medium size(around 200M records). So I don't want to collect the data into R memory to perform this operation.
Here is the reproducible dataset