uwdata / arquero

Query processing and transformation of array-backed data tables.
https://idl.uw.edu/arquero
BSD 3-Clause "New" or "Revised" License
1.22k stars 64 forks source link

Include an option to treat Arrow binary columns as String #283

Open lfkpoa opened 2 years ago

lfkpoa commented 2 years ago

Hi, I'm really impressed by Arquero and the use of the Arrow format. I'm trying to use ClickHouse database (which is really fast) with Arquero using Arrow format, since ClickHouse can return the output in different formats including Arrow, and I can send the sql query to their http endpoint using fetch, such as:

data_promise = aq.loadArrow('http://myserver:8123/?add_http_cors_header=1&user=default&password=&default_format=Arrow', 
    { fetch: { method: 'POST', body: 'SELECT col_A, count(*) as cnt from mytable group by col_A' }});

The problem is that strings are returned as binary type and Arquero does not recognize them as strings, so I need to use "derive" on every string column to decode them. I know the problem is that ClickHouse is not returning the type correctly but I think Arquero and ClickHouse make such a great match that it would be worth it to make it possible to load Arrow tables and convert binary types to strings automatically. Thank you.