Closed JulianMBr closed 3 years ago
Thanks for the bug report. Pandahouse is not aware of Nullable(String) indeed. Until I fix, You might try https://github.com/xzkostyan/clickhouse-sqlalchemy with pandas.read_sql.
Thanks for the update. pandas.readsql however does not give me the column names ;-) Ill try to update the table with fixed strings and wait for an update ;)
;-)
~/.pyenv/versions/3.6.2/envs/general_362/lib/python3.6/site-packages/pandahouse/convert.py in to_dataframe(lines, **kwargs)
60 dtypes, parse_dates, converters = {}, [], {}
61 for name, chtype in zip(names, types):
---> 62 dtype = CH2PD[chtype]
63 if dtype == 'object':
64 converters[name] = decode_escapes
KeyError: 'FixedString(4)'
@s1x Would You please create a pull request with FixedString modifications?
Why are you taking about FixedString? I thought this topic is about the missing Nullable support... I would like to use Nullable(String) types too. @kszucs is there any progress on this topic or do we still have to switch to sqlalchemy or the plain clickhouse-driver?
Is any progress made towards a solution of KeyError: 'Nullable(String)'
? @kszucs
@inkrement I'd suggest to use clickhouse-driver or https://github.com/ibis-project/ibis @haakonvt Sadly I don't have time to implement it, however Ibis has support for it. I highly recommend to try ibis until We have a native Apache Arrow database interface for clickhouse.
code change needed to fix this issue:
def to_dataframe(lines, **kwargs): names = lines.readline().decode('utf-8').strip().split('\t') l=[] for row in lines: row=row.decode('utf-8').strip().split('\t') l.append(row) df=pd.DataFrame(l,columns=names) df=df[~df['line'].str.contains('Nullable')] return df
change convert.py can fix this issue:
~/.pyenv/versions/3.6.2/envs/general_362/lib/python3.6/site-packages/pandahouse/convert.py in to_dataframe(lines, **kwargs) 60 dtypes, parse_dates, converters = {}, [], {} 61 for name, chtype in zip(names, types): ---> 62 dtype = CH2PD[chtype] 63 if dtype == 'object': 64 converters[name] = decode_escapes
KeyError: 'FixedString(4)'
before line 62, there should be a check
if chtype in CH2PD:
dtype = CH2PD[chtype]
else:
dtype = 'object'
I'm working on a more invasive refactor which will eliminate most of the type conversion issues. It depends on a couple of things but hopefully this issue can be closed soon.
Is any progress made towards a solution of KeyError: 'Nullable(String)'?
I saw that this issue should be fixed in master branch, could u please push this fix to pip repo? @kszucs
I had rather similar problem
File "/opt/airflow/venv/lib64/python3.6/site-packages/pandahouse/core.py", line 58, in read_clickhouse
return to_dataframe(lines, **kwargs)
File "/opt/airflow/venv/lib64/python3.6/site-packages/pandahouse/convert.py", line 67, in to_dataframe
dtype = CH2PD[chtype]
It solved, when I deleted all comments from query, comments like "-- Убрать это" don't work correctly in pandahouse
2023 year! Error not fix
File "C:\gmexpert\venv\lib\site-packages\pandahouse\core.py", line 58, in read_clickhouse return to_dataframe(lines, **kwargs) File "C:\gmexpert\venv\lib\site-packages\pandahouse\convert.py", line 67, in to_dataframe dtype = CH2PD[chtype] KeyError: 'Nullable(String)'
I get the following error with one column, even thoI do have other empty columns which do notcause any errors (such as aircraftCategory is a string and is NaN)