kszucs / pandahouse

Pandas interface for Clickhouse database
MIT License
228 stars 70 forks source link

KeyError: 'Nullable(String)' #4

Closed JulianMBr closed 3 years ago

JulianMBr commented 6 years ago

I get the following error with one column, even thoI do have other empty columns which do notcause any errors (such as aircraftCategory is a string and is NaN)

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-19-d531a02e1a3b> in <module>()
      2 
      3 connection = {'host': 'http://localhost:8123'}
----> 4 df = ph.read_clickhouse('SELECT hex, flight, recorded_date, recorded_datetime,recorded_longitude, recorded_latitude, recorded_squawk, recorded_altitude, recorded_verticalRate,recorded_track,recorded_speed,aircraftCategory FROM flightLog_MT LIMIT 1000000', connection=connection)

~/.pyenv/versions/3.6.2/envs/general_362/lib/python3.6/site-packages/pandahouse/core.py in read_clickhouse(query, tables, index, connection, **kwargs)
     56     lines = execute(query, external=external, stream=True,
     57                     connection=connection)
---> 58     return to_dataframe(lines, **kwargs)
     59 
     60 

~/.pyenv/versions/3.6.2/envs/general_362/lib/python3.6/site-packages/pandahouse/convert.py in to_dataframe(lines, **kwargs)
     60     dtypes, parse_dates, converters = {}, [], {}
     61     for name, chtype in zip(names, types):
---> 62         dtype = CH2PD[chtype]
     63         if dtype == 'object':
     64             converters[name] = decode_escapes

KeyError: 'Nullable(String)'
kszucs commented 6 years ago

Thanks for the bug report. Pandahouse is not aware of Nullable(String) indeed. Until I fix, You might try https://github.com/xzkostyan/clickhouse-sqlalchemy with pandas.read_sql.

JulianMBr commented 6 years ago

Thanks for the update. pandas.readsql however does not give me the column names ;-) Ill try to update the table with fixed strings and wait for an update ;)

JulianMBr commented 6 years ago

;-)

~/.pyenv/versions/3.6.2/envs/general_362/lib/python3.6/site-packages/pandahouse/convert.py in to_dataframe(lines, **kwargs)
     60     dtypes, parse_dates, converters = {}, [], {}
     61     for name, chtype in zip(names, types):
---> 62         dtype = CH2PD[chtype]
     63         if dtype == 'object':
     64             converters[name] = decode_escapes

KeyError: 'FixedString(4)'
kszucs commented 6 years ago

@s1x Would You please create a pull request with FixedString modifications?

inkrement commented 6 years ago

Why are you taking about FixedString? I thought this topic is about the missing Nullable support... I would like to use Nullable(String) types too. @kszucs is there any progress on this topic or do we still have to switch to sqlalchemy or the plain clickhouse-driver?

haakonvt commented 5 years ago

Is any progress made towards a solution of KeyError: 'Nullable(String)'? @kszucs

kszucs commented 5 years ago

@inkrement I'd suggest to use clickhouse-driver or https://github.com/ibis-project/ibis @haakonvt Sadly I don't have time to implement it, however Ibis has support for it. I highly recommend to try ibis until We have a native Apache Arrow database interface for clickhouse.

msfouad89 commented 4 years ago

code change needed to fix this issue:

def to_dataframe(lines, **kwargs): names = lines.readline().decode('utf-8').strip().split('\t') l=[] for row in lines: row=row.decode('utf-8').strip().split('\t') l.append(row) df=pd.DataFrame(l,columns=names) df=df[~df['line'].str.contains('Nullable')] return df

Agan0525 commented 3 years ago

change convert.py can fix this issue:

  1. vim /usr/local/lib/python3.7/dist-packages/pandahouse/convert.py
  2. insert " CH2PD['Nullable(String)'] = 'object' " after " CH2PD['Nothing'] = 'object' "
  3. save and quit ps : other keyerror can fix like this
shwangdev commented 3 years ago

~/.pyenv/versions/3.6.2/envs/general_362/lib/python3.6/site-packages/pandahouse/convert.py in to_dataframe(lines, **kwargs) 60 dtypes, parse_dates, converters = {}, [], {} 61 for name, chtype in zip(names, types): ---> 62 dtype = CH2PD[chtype] 63 if dtype == 'object': 64 converters[name] = decode_escapes

KeyError: 'FixedString(4)'

before line 62, there should be a check

if chtype in CH2PD:
    dtype = CH2PD[chtype]
else:
    dtype = 'object'
kszucs commented 3 years ago

I'm working on a more invasive refactor which will eliminate most of the type conversion issues. It depends on a couple of things but hopefully this issue can be closed soon.

shwangdev commented 3 years ago

Is any progress made towards a solution of KeyError: 'Nullable(String)'?

shwangdev commented 3 years ago

I saw that this issue should be fixed in master branch, could u please push this fix to pip repo? @kszucs

viraletta commented 3 years ago

I had rather similar problem

File "/opt/airflow/venv/lib64/python3.6/site-packages/pandahouse/core.py", line 58, in read_clickhouse
return to_dataframe(lines, **kwargs)
File "/opt/airflow/venv/lib64/python3.6/site-packages/pandahouse/convert.py", line 67, in to_dataframe
dtype = CH2PD[chtype]

It solved, when I deleted all comments from query, comments like "-- Убрать это" don't work correctly in pandahouse

Sank-WoT commented 12 months ago

2023 year! Error not fix

File "C:\gmexpert\venv\lib\site-packages\pandahouse\core.py", line 58, in read_clickhouse return to_dataframe(lines, **kwargs) File "C:\gmexpert\venv\lib\site-packages\pandahouse\convert.py", line 67, in to_dataframe dtype = CH2PD[chtype] KeyError: 'Nullable(String)'