kszucs / pandahouse

Pandas interface for Clickhouse database
MIT License
228 stars 70 forks source link

Allow reading of arrays - suggestion #14

Closed IanCal closed 5 years ago

IanCal commented 5 years ago

If this approach seems reasonable, I can clean it up and add testing.

What I've done is start by just telling pandas that array responses are objects, I couldn't see a better way of telling pandas what type of things to expect within the array.

Then, what clickhouse seems to have is basically JSON but with single and double quotes swapped around. By swapping them back (and escaping/unescaping the right things), and then using json.loads to read the response we get arrays back.

CREATE TABLE test_array
(
    str Array(String),
    dt Array(Date),
    int Array(UInt32),
    fl Array(Float64)
)
ENGINE = Memory
> insert into test_array values(['single \''], [CAST(now() as Date)], [1,2,3], [4.5,5.6])

Quick python testing locally

import pandahouse

df = pandahouse.read_clickhouse("select * from test_array")
print(df)

print(df["int"][0][0])
print(type(df["int"][0][0]))

Output

          str            dt        int          fl
0  [single ']  [2018-11-14]  [1, 2, 3]  [4.5, 5.6]
1
<class 'int'>

This is likely not very performant, but perhaps better than not being able to return anything at all. Another simpler option is to just mark any Array types as object and let the end user deal with the strings that come back.

kszucs commented 5 years ago

Hey @IanCal ! Sorry for the late response, I've just updated the project to properly build on CircleCI.

Yes, Your approach seems reasonable, could You please add some tests?

IanCal commented 5 years ago

Hi @kszucs no problem, thanks for the project it's been extremely useful for some apps I'm building at the moment.

I'll add some tests in the coming days hopefully and ping you when things look like they're more done.

Cheers! Ian

kszucs commented 5 years ago

@IanCal Would You be interested to develop new features? I have more ideas than time :)