Open MichaelChirico opened 5 years ago
Are they stored as nested tables or more complex values? Also, can you provide some sample files please?
I'm not sure how to answer about their storage, but the Hive type is array
and/or map
. Though those types are potentially recursive (and hence highly complex), I've only used one-level complexity (e.g. array(int)
or map(int, varchar)
).
Will try and create something & pass along. Any preferred medium?
medium, e.g. wetransfer?
yes, or dropbox, i could try gist...
seems i can upload tar.gz here! i ran the following in SparkR
and attached is the compressed output:
# spark start boilerplate
iris = iris
names(iris) = gsub('.', '_', names(iris), fixed = TRUE)
irisSDF = createDataFrame(iris)
irisSDF %>% createOrReplaceTempView('iris')
sql("
select 1 as int, 'a' as str, 1.1 as dbl,
timestamp('2019-09-20T12:34:56Z') as ts,
true as bool, date('2019-09-21') as dt,
map(Species, Sepal_Length) as mp,
array(Sepal_Width) as arr
from iris
") %>% write.parquet('/path/to/output')
thanks, will see what i can do
A lot of my common use cases store map & array data types. It would be great to have support to read such parquet with miniparquet.
Is this out if scope?