influxdata / influxdb-python

Python client for InfluxDB
MIT License
1.69k stars 520 forks source link

InfluxDB-python version 5.3.0 chunk=True #820

Open xiandong79 opened 4 years ago

xiandong79 commented 4 years ago

msgpack.exceptions.ExtraData: unpack(b) received extra data

Traceback (most recent call last):
  File "/Users/dong/Desktop/mosaic-research/analysis/analysis.py", line 17, in <module>
    public_book = mosaic_client.public_book(exchange=exchange, instrument=instrument, ts_start=ts, ts_end=ts+save_interval, depth=1)
  File "/Users/dong/Desktop/mosaic-research/py_mosaic_client/py_mosaic_client/mosaic_client.py", line 74, in public_book
    result = self.client.query(f'SELECT * FROM "l2_book-{exchange}" WHERE time > {ts_start} AND time <= {ts_end}', chunked=True, chunk_size=10000)
  File "/Users/dong/opt/anaconda3/lib/python3.7/site-packages/influxdb/client.py", line 518, in query
    expected_response_code=expected_response_code
  File "/Users/dong/opt/anaconda3/lib/python3.7/site-packages/influxdb/client.py", line 352, in request
    raw=False)
  File "msgpack/_unpacker.pyx", line 209, in msgpack._cmsgpack.unpackb
msgpack.exceptions.ExtraData: unpack(b) received extra data.
xiandong79 commented 4 years ago

https://github.com/influxdata/influxdb-python/commit/c903d73efcf49b4e340490072d777d8f34ac8e1c

I think it may be related to this PR

sebito91 commented 4 years ago

Thanks for reporting this @xiandong79, I'll investigate ASAP. I should have added a test to the dataframe_client for this.

hrbonz commented 4 years ago

I can take a look too if that helps, I haven't come across that issue though.

xiandong79 commented 4 years ago

the version 5.2.3. works well

hiksuman commented 4 years ago

I'm having the same issue querying from both Influx 1.7.10 and 1.7.7 Interestingly with Influx 1.0.2 the bug is not present.

sebito91 commented 4 years ago

There are a lot of differences between 5.2.3 and 5.3.0, which is why we stepped a minor release instead of point-release.

@hrbonz if you want to take a look that would be AWESOME!

laurikoobas commented 4 years ago

I am getting a different error, but seemingly from a similar place. InfluxDB-python version: 5.3.0 Python version: 3.7.4 Operating system version: Ubuntu 16.04


influxdb/client.py in request(self, url, method, params, data, stream, expected_response_code, headers)
    350                 packed=response.content,
    351                 ext_hook=_msgpack_parse_hook,
--> 352                 raw=False)
    353         else:
    354             response._msgpack = None

msgpack/_unpacker.pyx in msgpack._cmsgpack.unpackb()
`UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 8: invalid continuation byte`
chaconpiza commented 4 years ago

Similar with SHOW DIAGNOSTICS query

python3
Python 3.6.9 (default, Apr 18 2020, 01:56:04) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from influxdb import client
>>> influxdb_client = client.InfluxDBClient("192.168.10.6", "8086")
>>> influxdb_client.ping()
'1.7.6'
>>> influxdb_client.query('SHOW DIAGNOSTICS')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/vagrant/.local/lib/python3.6/site-packages/influxdb/client.py", line 518, in query
    expected_response_code=expected_response_code
  File "/home/vagrant/.local/lib/python3.6/site-packages/influxdb/client.py", line 352, in request
    raw=False)
  File "msgpack/_unpacker.pyx", line 213, in msgpack._cmsgpack.unpackb
ValueError: Unpack failed: incomplete input
yozik04 commented 4 years ago

I can confirm query with chunk=True does not work on 5.3.0.

xiandong79 commented 4 years ago

I can confirm query with chunk=True does not work on 5.3.0.

nikparmar commented 4 years ago

Hello Team, Any workaround for this issue?

yozik04 commented 4 years ago

Hello Team, Any workaround for this issue?

Sure use <5.3.0

marko-asplund commented 4 years ago

having the same issue - any progress?

AnkitSinghvi99 commented 3 years ago

Hi,

Having same issue. Any solution ?

Debian GNU/Linux 9.4 (stretch) python 2.7.13 Influx 1.8.3 Influxdb 5.3.1 msgpack 1.0.2

msgpack.exceptions.ExtraData: unpack(b) received extra data.

Traceback (most recent call last): File "/code/apps/FuelChangeoverPlot.py", line 179, in exportFromDb data = data_fetcher.fetch_fuel_change_over_plot(start_time=rangeStart, end_time=rangeEnd) File "/code/db_interface/data_fetcher.py", line 35, in fetch_fuel_change_over_plot df_dict = db_connector.query_for_single_measurement_range( File "/code/db_interface/db_connector.py", line 80, in query_for_single_measurement_range df_dict = client.query( File "/usr/local/lib/python3.9/site-packages/influxdb/_dataframe_client.py", line 199, in query results = super(DataFrameClient, self).query(query, **query_args) File "/usr/local/lib/python3.9/site-packages/influxdb/client.py", line 521, in query response = self.request( File "/usr/local/lib/python3.9/site-packages/influxdb/client.py", line 358, in request response._msgpack = msgpack.unpackb( File "msgpack/_unpacker.pyx", line 202, in msgpack._cmsgpack.unpackb

srijan commented 3 years ago

There are actually two issues here:

  1. Unpack issue when using msgpack. I did not debug this further, but here's a workaround that works for me: use json instead of msgpack. This can be forced using: client = InfluxDBClient(host, port, u, p, db, headers={'Accept': 'application/json'}, gzip=True)

  2. Even when using the above, DataFrameClient does not work. This is because DataFrameClient was not updated along with this commit c903d73.

hrbonz commented 3 years ago

Debian GNU/Linux bullseye/sid Python 3.9.2 influxdb-python master branch Influxdb 1.8.5 and 1.7.3 msgpack 1.0.2

Run my test scripts with export MSGPACK_PUREPYTHON=1 to use python implementation of msgpackrather than the C, easier for debugging.

Analysis

I've looked into this issue today, it looks to me like a combination of two problems:

When running without any headers, we get msgpack back with the following:

b'\x81\xa7results\x91\x82\xacstatement_id\x00\xa6series\x9e\x83\xa4name\xa5build\xa7columns\x94\xa6Branch\xaaBuild Time\xa6Commit\xa7Version\xa6values\x91\x94\xa31.7\xa0\xd9(ff383cdc0420217e3460dabe17db54f8557d95b6\xa51.7.8\x83\xa4name\xa6config\xa7columns\x92\xacbind-address\xb2reporting-disabled\xa6values\x91\x92\xae127.0.0.1:8098\xc3\x83\xa4name\xb2config-coordinator\xa7columns\x97\xb1log-queries-after\xb6max-concurrent-queries\xb2max-select-buckets\xb0max-select-point\xb1max-select-series\xadquery-timeout\xadwrite-timeout\xa6values\x91\x97\x00\x00\x00\x00\x83\xa4name\xaaconfig-cqs\xa7columns\x93\xa7enabled\xb3query-stats-enabled\xacrun-interval\xa6values\x91\x93\xc3\xc2\x83\xa4name\xabconfig-data\xa7columns\x9c\xb5cache-max-memory-size\xbacache-snapshot-memory-size\xd9"cache-snapshot-write-cold-duration\xd9 compact-full-write-cold-duration\xa3dir\xbamax-concurrent-compactions\xb7max-index-log-file-size\xb7max-series-per-database\xb2max-values-per-tag\xb8series-id-set-cache-size\xa7wal-dir\xafwal-fsync-delay\xa6values\x91\x9c\xb6/var/lib/influxdb/data\x00\xd2\x00\x0fB@\xd2\x00\x01\x86\xa0d\xb5/var/lib/influxdb/wal\x83\xa4name\xacconfig-httpd\xa7columns\x96\xafaccess-log-path\xacbind-address\xa7enabled\xadhttps-enabled\xb4max-connection-limit\xadmax-row-limit\xa6values\x91\x96\xa0\xa5:8096\xc3\xc2\x00\x00\x83\xa4name\xabconfig-meta\xa7columns\x91\xa3dir\xa6values\x91\x91\xb6/var/lib/influxdb/meta\x83\xa4name\xaeconfig-monitor\xa7columns\x93\xaestore-database\xadstore-enabled\xaestore-interval\xa6values\x91\x93\xa9_internal\xc3\x83\xa4name\xb1config-precreator\xa7columns\x93\xaeadvance-period\xaecheck-interval\xa7enabled\xa6values\x91\x93\xc3\x83\xa4name\xb0config-retention\xa7columns\x92\xaecheck-interval\xa7enabled\xa6values\x91\x92\xc3\x83\xa4name\xb1config-subscriber\xa7columns\x94\xa7enabled\xachttp-timeout\xb1write-buffer-size\xb1write-concurrency\xa6values\x91\x94\xc3\xd1\x03\xe8(\x83\xa4name\xa7network\xa7columns\x91\xa8hostname\xa6values\x91\x91\xa4db01\x83\xa4name\xa7runtime\xa7columns\x94\xa6GOARCH\xaaGOMAXPROCS\xa4GOOS\xa7version\xa6values\x91\x94\xa5amd64\x02\xa5linux\xa6go1.11\x83\xa4name\xa6system\xa7columns\x94\xa3PID\xabcurrentTime\xa7started\xa6uptime\xa6values\x91\x94\xd1*\x84\xc7\x0c\x05\x00\x00\x00\x00`\x86\x8e\xe5\x12J\xde\xab\xc7\x0c\x05\x00\x00\x00\x00`\x86u\x7f\x0c\xca\x93\xb4\xb21h48m22.092293879s'

Both should be representing the same data but the config-coordinator structure doesn't include all the values: x83\xa4name\xb2config-coordinator\xa7columns\x97\xb1log-queries-after\xb6max-concurrent-queries\xb2max-select-buckets\xb0max-select-point\xb1max-select-series\xadquery-timeout\xadwrite-timeout\xa6values\x91\x97\x00\x00\x00\x00\x83\xa4name\xaaconfig-cqs We can see here by the end of the string, we have \x97 that defines a 7 entries 'fixarray' but we're getting only three zeroes (\x00) before seeing an \x83 that should start the next data structure ('config-cqs'). For this reason, I believe the bug actually exists server side. That might be a similar issue generated when doing a regular query, I couldn't figure it out. I'm also not extra comfortable with go so couldn't really find where this is implemented in the server. This behavior appeared soon after my commit because 7fb5e946062dd36a84801e4a03012a3c032a70db changed the default headers to request msgpack instead of the default JSON.

Summary

  1. I should push a PR to implement the fixed chunked behavior in DataFrameClient.
  2. I suspect there is a bug with the msgpack implementation server side but can't help with this. I think someone with better go knowledge should dig on that one.
hrbonz commented 3 years ago

@sebito91

hrbonz commented 3 years ago

Tried to do the request directly on the line with curl and still got a messed up msgpack answer with the same issue:

$ curl -G 'http://localhost:8096/query' --data-urlencode q='SHOW DIAGNOSTICS'  --header "Accept: application/x-msgpack" --header "Content-Type: application/json" -u root --output response.txt
AnkitSinghvi99 commented 3 years ago

@hrbonz @sebito91 May be i am asking a silly question here. Above fix is part of current released library or future release. If future when it is expected to release?

As i tested today i still get below issue. msgpack.exceptions.ExtraData: unpack(b) received extra data.

MichielBbal commented 3 years ago

Same issue here: msgpack/_unpacker.pyx in msgpack._cmsgpack.unpackb()

ExtraData: unpack(b) received extra data.

KirannBhavaraju commented 3 years ago

For any future readers,

client = InfluxDBClient(host=host, port=port, username=user, password=password, database=dbname)
start_time = time.monotonic()
res = pd.DataFrame(client.query("select * from X where time > now() - 30m", chunked=True).get_points())
end_time = time.monotonic()
with outlock:
     print("Result from {} took {}".format(host,end_time-start_time))
     print(res)

Versions used

python --version = Python 3.7.8 influxdb.__version__ = 5.2.3

ErlendFax commented 3 years ago

Still get ExtraData: unpack(b) received extra data., but after trying @KirannBhavaraju suggestion it worked!

Only thing I did was to remove thechunk_size=xxxx argument.

client = InfluxDBClient(blah blah)
result = client.query(q, chunked=True)

python = "^3.8" influxdb = "5.3.1"

Kylmakalle commented 2 years ago

Only thing I did was to remove thechunk_size=xxxx argument.

Responses will be chunked by series or by every 10,000 points, whichever occurs first. https://docs.influxdata.com/influxdb/v1.7/guides/querying_data/#chunking