Closed allisonwang closed 5 years ago
Yes, this is the expected behavior, tested here: https://github.com/gusutabopb/aioinflux/blob/v0.4.0/tests/test_dataframe.py#L48
When a tag or tag+time (instead of just time), the JSON returned by contains two series instead of one as shown below:
from aioinflux import InfluxDBClient
c = InfluxDBClient(db='testdb')
await c.create_database()
await c.write('foo,tag=A value=1')
await c.write('foo,tag=A value=2')
await c.write('foo,tag=B value=3')
await c.write('foo,tag=B value=4')
await c.write('foo,tag=B value=5')
r = await c.query('SELECT COUNT(*) FROM foo GROUP BY "tag"')
print(r)
{'results': [{'statement_id': 0,
'series': [{'name': 'foo',
'tags': {'tag': 'A'},
'columns': ['time', 'count_value'],
'values': [[0, 2]]},
{'name': 'foo',
'tags': {'tag': 'B'},
'columns': ['time', 'count_value'],
'values': [[0, 3]]}]}]}
In aioinflux I decided that dataframes should be made on a series basis (here "series" refers to InfluxDB series, not to be confused with Pandas series). It is not self-evident if or how multiple series should be concatenated/merged. I believe that varies depending on the use case, therefore the decision to generate dataframes on a series basis, keeping the structure of the original JSON blob.
In case you want to do a simple concatenation of the dataframes, you can try something like:
c.output = 'dataframe'
r = await c.query('SELECT COUNT(*) FROM foo GROUP BY "tag"')
print(pd.concat(r.values()))
count_value tag
0 2 A
0 3 B
Depending on your data you may want to do the concatenate the data on the column axis (axis=1
) or perhaps do some more fancy merging using pd.merge
or pd.DataFrame.join
.
This behavior should be properly documented.
Just for completeness sake, on the above example, if you add the following data:
await c.write('foo2,tag=A value=1')
await c.write('foo2,tag=A value=2')
await c.write('foo2,tag=B value=3')
await c.write('foo2,tag=B value=4')
await c.write('foo2,tag=B value=5')
And change your query to:
r = await c.query('SELECT COUNT(*) FROM /foo.*/ GROUP BY "tag"')
You get four series (foo,tag=A
, foo,tag=B
, foo2,tag=A
, foo2,tag=B
):
{'results': [{'statement_id': 0,
'series': [{'name': 'foo',
'tags': {'tag': 'A'},
'columns': ['time', 'count_value'],
'values': [[0, 2]]},
{'name': 'foo',
'tags': {'tag': 'B'},
'columns': ['time', 'count_value'],
'values': [[0, 3]]},
{'name': 'foo2',
'tags': {'tag': 'A'},
'columns': ['time', 'count_value'],
'values': [[0, 2]]},
{'name': 'foo2',
'tags': {'tag': 'B'},
'columns': ['time', 'count_value'],
'values': [[0, 3]]}]}]}
Again, on dataframe
mode you will get a dictionary containing four dataframes and how to merge/concatenate that is left to the user.
Actually, a single series result is a special case of a single series, as shown here: https://github.com/gusutabopb/aioinflux/blob/v0.4.0/aioinflux/serialization/dataframe.py#L56
Probably always returning a dictionary would be the most generic way, but since at least from my own usage most of my queries are single-statement/single-series I made that a special case.
Upon some further investigation I found a minor bug (inconsistency) when doing multi-statement queries. That was fixed in version v0.4.1 (just released). A note to the user guide regarding this perhaps unexpected behavior of returning dictionaries was also added to the docs.
When the query has group by clause other than time, for example
The dataframe output mode returns a dictionary instead of dataframe. The key seems to be a string with
"measurement_name, category=A", "measurement_name, category=B",...
and values of the dictionary are dataframes. Is this expected?