Open Lytmbot opened 4 years ago
not sure if this is acceptable, but a solution could be as simple as:
def to_dataframe(lines, **kwargs):
names = lines.readline().decode('utf-8').strip().split('\t')
types = lines.readline().decode('utf-8').strip().split('\t')
dtypes, parse_dates, converters = {}, [], {}
for name, chtype in zip(names, types):
if chtype.startswith('DateTime64'):
precs = int(chtype.replace('DateTime64(', '').replace(')', ''))
chtype = 'DateTime'
dtype = CH2PD[chtype]
if dtype == 'object':
converters[name] = decode_escapes
elif dtype.startswith('datetime'):
parse_dates.append(name)
else:
dtypes[name] = dtype
return pd.read_table(lines, header=None, names=names, dtype=dtypes,
parse_dates=parse_dates, converters=converters,
na_values=set(), keep_default_na=False, **kwargs)
Thanks for the bug report and the patch.
I'm trying to allocate some time to create a new release in the upcoming week.
Hi @kszucs have you had a chance to evaluate this? This issue is preventing us from meaningfully using pandahouse unfortunately. If @Lytmbot's proposed soln is agreeable, I'd be happy to submit a PR provided you can cut another release.
Could you please submit a PR including unittests?
My mid-term plan is to use the newly added arrow and parquet clickhouse output formats, but their type support is incomplete so far.
Hey, thanks for your great work, this module has been really helpful!
using clickhouse-server version: 20.3.8.53
I have a small problem with DateTime64(ns) where the size of ns has been explicitly set in the table definition.
With a clickhouse table as follows:
A query:
Results in:
If I understand correctly, the mapping defined earlier in the file convert.py
does not cover the DateTime64(6) case. Or by extension any other DateTime(ns) case?
I would be happy to contribute a solution with a little guidance.
Thanks