questdb / py-questdb-client

Python client for QuestDB InfluxDB Line Protocol
https://py-questdb-client.readthedocs.io
Apache License 2.0
55 stars 10 forks source link

Inserting pandas dataframe silently failed if Symbol columns are not categorical #36

Closed 0liu closed 1 year ago

0liu commented 1 year ago

Describe the bug

The new Python client v1.1.0 supports inserting a Pandas dataframe, with all Symbol columns are converted to Pandas Categorical data type by pd.Categorical, as shown in the doc. However, if the dataframe columns corresponding to Symbol columns have string types, the client will silently fail to insert the data and return None. It's expected the api raises an exception or returns error.

To reproduce

  1. Create table
    CREATE TABLE IF NOT EXISTS test_table
    (
    ts TIMESTAMP,
    code SYMBOL CAPACITY 256 INDEX CAPACITY 256,
    loc SYMBOL CAPACITY 256 INDEX CAPACITY 256,
    temperature double
    ) TIMESTAMP(ts);
  2. Insert data
    
    df = pd.DataFrame({
    'ts': [dt.datetime(2023,1,18,1), dt.datetime(2023,1,18,2)],
    'code': ['sensorA', 'sensorB'], 
    'loc': ['east', 'west'], 
    'temperature': [45.5, 46.2]
    })

with Sender('localhost', 9009) as sender: sender.dataframe(df, table_name='test_table')

Select from `test_table` returns empty.

Then convert symbol columns to pandas categorical:
```python
for c in ['code', 'loc']:
    df[c] = pd.Categorical(df[c])

and repeat the insertion. It will successfully insert the data.

Expected Behavior

It's expected to raise IngressError when data insertion failed.

Environment

- **QuestDB version**: latest docker image. Python client v1.1.0
- **OS**: linux
- **Browser**: Firefox

Additional context

No response

amunra commented 1 year ago

Closing as this is expected behaviour and not a bug. Read on for explanation and what to do in your code.

Errors from the server are present in the logs. As of now (but in the works) the ILP protocol that .dataframe() sits on does not send errors back to client.

The dataframe API has no way of knowing if a string column is supposed to be sent as symbols or strings ILP types unless otherwise specified via the symbols named argument.

For an example of this, see the basic pandas example: https://py-questdb-client.readthedocs.io/en/latest/examples.html#pandas-basics

The dataframe API params are fully documented here: https://py-questdb-client.readthedocs.io/en/latest/api.html#questdb.ingress.Buffer.dataframe

I hope this helps and feel free to reopen this ticket if there's anything else you need.