Closed andreish closed 11 months ago
I will try to use your snippet to see if I can repro.
In order to simply the debugging, what error message did you see when put fails? Any errors in /log/ticktock.log?
Retried today on dev branch , seeing two strange behaviours in log/ticktok.log: one is with multiple datapoints:
`2023-07-20 15:43:14.381 [INFO] [tcp_listener_1] entered epoll_wait() loop, fd=167 2023-07-20 15:43:20.422 [ERROR] [http_10_task_0] Unhandled request: [imestamp": 1689867800, (null) "value": 68, "tags": {"test": "True"}}, {"metric": "powerLevel", "timestamp": 1689867800, "value": 68, "tags": {"test": "True"}}, {"metric": "powerLevel", "timestamp": 1689867800, "value": 68, "tags": {"test": "True"}}, {"metric": "powerLevel", "timestamp": 1689867800, "value": 68, "tags": {"test": "True"}}, {"metric": "powerLevel", "timestamp": 1689867800, "value": 68, "tags": {"test": "True"}}, {"metric": "powerLevel", "timestamp": 1689867800, "value": 68, "tags": {"test": "True"}}, {"metric": "powerLevel", "timestamp": 1689867800, "value": 68, "tags": {"test": "True"}}, {"metric": "powerLevel", "timestamp": 1689867800, "value": 68, "tags": {"test": "True"}}, {"metric": "powerLevel", "timestamp": 1689867800, "value": 68, "tags": {"test": "True"}}, {"metric": "powerLevel", "timestamp": 1689867800, "value": 68, "tags": {"test": "True"}}, {"metric": "powerLevel", "timestamp": 1689867800, "value": 68, "tags": {"test": "True"}}, {"metric": "powerLevel", "timestamp": 1689867800, "value": 68, "tags": {"test": "True"}}], close:false, len:1802, body:[{"metric": "powerLevel", "timestamp": 1689867800, "value": 69, "tags": {"test": "True"}}, {"metric": "powerLevel", "timestamp": 1689867800, "value": 69, "tags": {"test": "True"}}, {"metric": "powerLevel", "timestamp": 1689867800, "value": 69, "tags": {"test": "True"}}, {"metric": "powerLevel", "timestamp": 1689867800, "value": 69, "tags": {"test": "True"}}, {"metric": "powerLevel", "timestamp": 1689867800, "value": 69, "tags": {"test": "True"}}, {"metric": "powerLevel", "timestamp": 1689867800, "value": 69, "tags": {"test": "True"}}, {"metric": "powerLevel", "timestamp": 1689867800, "value": 69, "tags": {"test": "True"}}, {"metric": "powerLevel", "timestamp": 1689867800, "value": 69, "tags": {"test": "True"}}, {"metric": "powerLevel", "timestamp": 1689867800, "value": 69, "tags": {"test": "True"}}, {"metric": "powerLevel", "timestamp": 1689867800, "value": 69, "tags": {"test": "True"}}, {"metric": "powerLevel", "timestamp": 1689867800, "value": 69, "tags": {"test": "True"}}, {"metric": "powerLevel", "timestamp": 1689867800, "value": 69, "tags": {"test": "True"}}, {"metric": "powerLevel", "timestamp": 1689867800, "value": 69, "tags": {"test": "True"}}, {"metric": "powerLevel", "timestamp": 1689867800, "value": 69, "tags": {"test": "True"}}, {"metric": "powerLevel", "timestamp": 1689867800, "value": 69, "tags": {"test": "True"}}, {"metric": "powerLevel", "timestamp": 1689867800, "value": 69, "tags": {"test": "True"}}, {"metric": "powerLevel", "timestamp": 1689867800, "value": 69, "tags": {"test": "True"}}, {"metric": "powerLevel", "timestamp": 1689867800, "value": 69, "tags": {"test": "True"}}, {"metric": "powerLevel", "timestamp": 1689867800, "value": 69, "tags": {"test": "True"}}, {"metric": "powerLevel", "timestamp": 1689867800, "value": 69, "tags": {"test": "True"}}] POST /api/put HTTP/1.1 Content-Length: 1802 Content-MD5: XU9E74eB8pMz+k5gbitCqw== Host: localhost:6182 Accept-Encoding: identity Connection: keep-alive
[{"metric": "powerLevel", "timestamp": 1689867800, "value": 92, "tags": {"test": "True"}}, {"metric": "powerLevel", "timestamp": 1689867800, "value": 92, "tags": {"test": "True"}}, {"metric": "powerLevel", "timestamp": 1689867800, "value": 92, "tags": {"test": "True"}}, {"metric": "powerLevel", "timestamp": 1689867800, "value": 92, "tags": {"test": "True"}}, {"metric": "powerLevel", "timestamp": 1689867800, "value": 92, "tags": {"test": "True"}}, {"metric": "powerLevel", "timestamp": 1689867800, "value": 92, "tags": {"test": "True"}}, {"metric": "powerLevel", "timestamp": 1689867800, "value": 92, "tags": {"test": "True"}}, {"metric": "powerLevel", "timestamp": 1689867800, "value": 92, "tags": {"test": "True"}}, {"metric": "powerLevel", "timestamp": 1689867800, "value": 92, "tags": {"test": "True"}}, {"metric": "powerLevel", "timestamp": 1689867800, "value": 92, "tags": {"test": "True"}}, {"metric": "powerLevel", "timestamp": 1689867800, "value": 92, "tags": {"test": "True"}}, {"metric": "powerLevel", "timestamp": 1689867800, "v 2023-07-20 15:43:20.424 [INFO] [http_10_task_0] Created tsdb[1689811200,1689897600), tstamp = 1689867800 2023-07-20 15:43:20.424 [INFO] [http_10_task_0] opening /home/asuhan/Projects/eon/ticktock/data/2023/7/1689811200.1689897600/index for write 2023-07-20 15:43:20.424 [INFO] [http_10_task_0] opening /home/asuhan/Projects/eon/ticktock/data/2023/7/1689811200.1689897600/header.0 for write 2023-07-20 15:43:20.424 [INFO] [http_10_task_0] opening /home/asuhan/Projects/eon/ticktock/data/2023/7/1689811200.1689897600/data.0 for write `
and with only one datapoint:
`2023-07-20 15:47:23.797 [INFO] [http_listener_0] entered epoll_wait() loop, fd=62 2023-07-20 15:47:23.797 [INFO] [main] Original SO_RCVBUF = 131072 2023-07-20 15:47:23.797 [INFO] [main] SO_RCVBUF set to 106496 2023-07-20 15:47:23.797 [INFO] [main] Original SO_SNDBUF = 16384 2023-07-20 15:47:23.797 [INFO] [main] SO_BINDTODEVICE = 2023-07-20 15:47:23.797 [INFO] [main] SO_DEBUG = 0 2023-07-20 15:47:23.797 [INFO] [main] SO_DONTROUTE = 0 2023-07-20 15:47:23.797 [INFO] [main] SO_KEEPALIVE = 0 2023-07-20 15:47:23.797 [INFO] [main] SO_PRIORITY = 0 2023-07-20 15:47:23.797 [INFO] [main] SO_RCVBUF = 212992 2023-07-20 15:47:23.797 [INFO] [main] SO_RCVBUFFORCE: errno = 92 2023-07-20 15:47:23.797 [INFO] [main] SO_RCVLOWAT = 1 2023-07-20 15:47:23.797 [INFO] [main] SO_SNDBUF = 16384 2023-07-20 15:47:23.797 [INFO] [main] SO_SNDBUFFORCE: errno = 92 2023-07-20 15:47:23.797 [INFO] [main] SO_SNDLOWAT = 1 2023-07-20 15:47:23.797 [INFO] [main] Original SO_RCVBUF = 131072 2023-07-20 15:47:23.797 [INFO] [main] SO_RCVBUF set to 106496 2023-07-20 15:47:23.797 [INFO] [main] Original SO_SNDBUF = 16384 2023-07-20 15:47:23.797 [INFO] [main] SO_BINDTODEVICE = 2023-07-20 15:47:23.797 [INFO] [main] SO_DEBUG = 0 2023-07-20 15:47:23.797 [INFO] [main] SO_DONTROUTE = 0 2023-07-20 15:47:23.797 [INFO] [main] SO_KEEPALIVE = 0 2023-07-20 15:47:23.797 [INFO] [main] SO_PRIORITY = 0 2023-07-20 15:47:23.797 [INFO] [main] SO_RCVBUF = 212992 2023-07-20 15:47:23.797 [INFO] [main] SO_RCVBUFFORCE: errno = 92 2023-07-20 15:47:23.797 [INFO] [main] SO_RCVLOWAT = 1 2023-07-20 15:47:23.797 [INFO] [main] SO_SNDBUF = 16384 2023-07-20 15:47:23.797 [INFO] [main] SO_SNDBUFFORCE: errno = 92 2023-07-20 15:47:23.797 [INFO] [main] SO_SNDLOWAT = 1 2023-07-20 15:47:23.797 [INFO] [tcp_listener_10] entered epoll_wait() loop, fd=67 2023-07-20 15:47:23.797 [INFO] [tcp_listener_20] entered epoll_wait() loop, fd=70 2023-07-20 15:47:23.798 [INFO] [tcp_listener_30] entered epoll_wait() loop, fd=73 2023-07-20 15:47:23.798 [INFO] [tcp_listener_40] entered epoll_wait() loop, fd=76 2023-07-20 15:47:23.798 [INFO] [tcp_listener_50] entered epoll_wait() loop, fd=79 2023-07-20 15:47:23.798 [INFO] [tcp_listener_60] entered epoll_wait() loop, fd=82 2023-07-20 15:47:23.798 [INFO] [tcp_listener_70] entered epoll_wait() loop, fd=85 2023-07-20 15:47:23.798 [INFO] [tcp_listener_80] entered epoll_wait() loop, fd=88 2023-07-20 15:47:23.798 [INFO] [tcp_listener_90] entered epoll_wait() loop, fd=91 2023-07-20 15:47:23.798 [INFO] [tcp_listener_100] entered epoll_wait() loop, fd=94 2023-07-20 15:47:23.798 [INFO] [tcp_listener_110] entered epoll_wait() loop, fd=97 2023-07-20 15:47:23.798 [INFO] [tcp_listener_120] entered epoll_wait() loop, fd=100 2023-07-20 15:47:23.798 [INFO] [tcp_listener_130] entered epoll_wait() loop, fd=104 2023-07-20 15:47:23.798 [INFO] [tcp_listener_140] entered epoll_wait() loop, fd=107 2023-07-20 15:47:23.798 [INFO] [tcp_listener_150] entered epoll_wait() loop, fd=110 2023-07-20 15:47:23.799 [INFO] [tcp_listener_160] entered epoll_wait() loop, fd=113 2023-07-20 15:47:23.799 [INFO] [tcp_listener_11] entered epoll_wait() loop, fd=116 2023-07-20 15:47:23.799 [INFO] [tcp_listener_21] entered epoll_wait() loop, fd=119 2023-07-20 15:47:23.799 [INFO] [tcp_listener_31] entered epoll_wait() loop, fd=122 2023-07-20 15:47:23.799 [INFO] [tcp_listener_41] entered epoll_wait() loop, fd=125 2023-07-20 15:47:23.799 [INFO] [tcp_listener_51] entered epoll_wait() loop, fd=128 2023-07-20 15:47:23.799 [INFO] [tcp_listener_61] entered epoll_wait() loop, fd=131 2023-07-20 15:47:23.799 [INFO] [tcp_listener_71] entered epoll_wait() loop, fd=134 2023-07-20 15:47:23.799 [INFO] [tcp_listener_81] entered epoll_wait() loop, fd=137 2023-07-20 15:47:23.799 [INFO] [tcp_listener_91] entered epoll_wait() loop, fd=140 2023-07-20 15:47:23.799 [INFO] [tcp_listener_101] entered epoll_wait() loop, fd=143 2023-07-20 15:47:23.799 [INFO] [tcp_listener_111] entered epoll_wait() loop, fd=146 2023-07-20 15:47:23.799 [INFO] [tcp_listener_121] entered epoll_wait() loop, fd=149 2023-07-20 15:47:23.800 [INFO] [tcp_listener_131] entered epoll_wait() loop, fd=152 2023-07-20 15:47:23.800 [INFO] [tcp_listener_141] entered epoll_wait() loop, fd=155 2023-07-20 15:47:23.800 [INFO] [tcp_listener_151] entered epoll_wait() loop, fd=158 2023-07-20 15:47:23.800 [INFO] [tcp_listener_161] entered epoll_wait() loop, fd=161 2023-07-20 15:47:23.800 [INFO] [tcp_listener_0] entered epoll_wait() loop, fd=164 2023-07-20 15:47:23.800 [INFO] [tcp_listener_1] entered epoll_wait() loop, fd=167 2023-07-20 15:47:33.115 [ERROR] [http_10_task_0] Unhandled request: [68053, "value": (null) 23, "tags": {"test": "True"}}, close:false, len:90, body:{"metric": "powerLevel", "timestamp": 1689868053, "value": 17, "tags": {"test": "True"}} POST /api/put HTTP/1.1 Content-Length: 90 Content-MD5: 4x8lUZ7oYq319DELn+S47A== Host: localhost:6182 Accept-Encoding: identity Connection: keep-alive
{"metric": "powerLevel", "timestamp": 1689868053, "value": 11, "tags": {"test": "True"}} POST /api/put HTTP/1.1 Content-Length: 90 Content-MD5: //jX2LpOYyxZ9+OQVpSSvg== Host: localhost:6182 Accept-Encoding: identity Connection: keep-alive
{"metric": "powerLevel", "timestamp": 1689868053, "value": 78, "tags": {"test": "True"}} POST /api/put HTTP/1.1 Content-Length: 90 Content-MD5: Z3cTahRtuR8Io8+hmVmvKg== Host: localhost:6182 Accept-Encoding: identity Connection: keep-alive
{"metric": "powerLevel", "timestamp": 1689868053, "value": 67, "tags": {"test": "True"}} POST /api/put HTTP/1.1 Content-Length: 90 Content-MD5: xZakupFQz8HxA7kRlNNzNQ== Host: localhost:6182 Accept-Encoding: identity Connection: keep-alive
{"metric": "powerLevel", "timestamp": 1689868053, "value": 60, "tags": {"test": "True"}} POST /api/put HTTP/1.1 Content-Length: 90 Content-MD5: zlftUufo92UzmNO0e7BdBw== Host: localhost:6182 Accept-Encoding: identity Connection: keep-alive
{"metric": "powerLevel", "timestamp": 1689868053, "value": 27, "tags": {"test": "True"}} POST /api/put HTTP/1.1 Content-Length: 90 Content-MD5: X5LROYkgFvZlC9xl0Khx8g== Host: localhost:6182 Accept-Encoding: identity Connection: keep-alive
{"metric": "powerLevel", "timestamp": 1689868053, "value": 87, "tags": {"test": "True"}} POST /api/put HTTP/1.1 Content-Length: 90 Content-MD5: vDjmjJQrgl99l59oncBnLQ== Host: localhost:6182 Accept-Encoding: identity Connection: keep-alive
{"metric": "powerLevel", "timestamp": 1689868053, "value": 28, "tags": {"test": "True"}} POST /api/put HTTP/1.1 Content-Length: 90 Content-MD5: BNgqIXrv9ZT8YUiWRsS99w== Host: localhost:6182 Accept-Encoding: identity Connection: keep-alive
{"metric": "powerLevel", "timestamp": 1689868053, "value": 52, "tags": {"test": "True"}} POST /api/put HTTP/1.1 Content-Length: 90 Content-MD5: qS3FoOATNAxblnwE6+CH6Q== Host: localhost:6182 Accept-Encoding: identity Connection: keep-alive
{"metric": "powerLevel", "timestamp": 1689868053, "value": 51, "tags": {"test": "True"}} POST /api/put HTTP/1.1 Content-Length: 90 Content-MD5: AFCDPXbUCwg0fAgJEOHAsQ== Host: localhost:6182 Accept-Encoding: identity Connection: keep-alive
{"metric": "powerLevel", "timestamp": 1689868053, "value": 36, "tags": {"test": "True"}} POST /api/put HTTP/1.1 Content-Length: 90 Content-MD5: +uiWezkTzNzy+LynL6bxGw== Host: localhost:6182 Accept-Encoding: identity Connection: keep-alive
{"metric": "powerLevel", "timestamp": 1689868053, "value": 68, "tags": {"test": "True"}} POST /api/put HTTP/1.1 Content-Length: 90 Content-MD5: mvnQq4l2z/rJKhhjnFIPpg== Host: localhost:6182 Accept-Encoding: identity Connection: keep-alive
{"metric": "powerLevel", "timestamp": 1689868053, "value": 53, "tags": {"test": "True"}} POST /api/put HTTP/1.1 Content-Length: 90 Content-MD5: RkhJTrNnNJOVeFoKCQCMMw== Host: localhost:6182 Accept-Encoding: identity Connection: keep-alive
{"metric": "powerLevel", "timestamp": 1689868053, "value": 54, "tags": {"test": "True"}} POST /api/put HTTP/1.1 Content-Length: 90 Content-MD5: UjM3MVE6s3yji2QGmoheBw== Host: localhost:6182 Accept-Encoding: identity Connection: keep-alive
{"metric": "powerLevel", "timestamp": 1689868053, "value": 90, "tags": {"test": "True"}} POST /api/put HTTP/1.1 Content-Length: 90 Content-MD5: STEGf8MBgFJETWO01TEHJQ== Host: localhost:6182 Accept-Encoding: identity Connection: keep-alive
{"metric": "powerLevel", "timestamp": 1689868053, "value": 37, "tags": {"test": "True"}} POST /api/put HTTP/1.1 Content-Length: 90 Content-MD5: uoi6HfHSlzwJ4VsnrvmIkg== Host: localhost:6182 Accept-Encoding: identity Connection: keep-alive
{"metric": "powerLevel", "timestamp": 16898680 2023-07-20 15:47:33.116 [INFO] [http_10_task_0] Created tsdb[1689811200,1689897600), tstamp = 1689868053`
I will add python sample code to help with repro. change BATCH_SIZE to send multiple datapoints.
`import asyncio import atexit from dugong import HTTPConnection, AioFuture from datetime import datetime import calendar import random import json import sys
OPENTSDB_HOST = "localhost" PORT = 4242 PORT = 6182
TARGET_WRITES_PER_MIN = 12497 BATCH_SIZE = 1 #number of datapoinits
TARGET_READS_PER_MIN = 5420
loop = asyncio.get_event_loop() atexit.register(loop.close)
def make_metrics(): tstamp = calendar.timegm(datetime.now().timetuple()) item = {"metric": "powerLevel", "timestamp": tstamp, "value": random.randint(0, 100), "tags": {"test": "True"}}
if BATCH_SIZE == 1:
return item
return [item for i in range(BATCH_SIZE)]
def get_requests(): for i in range(TARGET_WRITES_PER_MIN): yield json.dumps(make_metrics())+"\r\n"
with HTTPConnection(OPENTSDB_HOST, PORT) as conn:
# all the requests.
def send_requests():
for req in get_requests():
# print("send: {}".format(metric))
yield from conn.co_send_request(method='POST',
path="/api/put",
body=bytes(req, "utf-8"))
# This generator function returns a coroutine that reads
# all the responses
def read_responses():
bodies = []
for i in range(TARGET_WRITES_PER_MIN):
resp = yield from conn.co_read_response()
sys.stdout.write("status={}\n".format(resp.status))
assert resp.status == 200 or resp.status == 204
buf = yield from conn.co_readall()
bodies.append(buf)
print("recv so far {} responses".format(len(bodies)))
return bodies
# Create the coroutines
send_crt = send_requests()
recv_crt = read_responses()
# Register the coroutines with the event loop
send_future = AioFuture(send_crt, loop=loop)
recv_future = AioFuture(recv_crt, loop=loop)
print("running loop")
# Run the event loop until the receive coroutine is done (which
# implies that all the requests must have been sent as well):
loop.run_until_complete(recv_future)
# Get the result returned by the coroutine
bodies = recv_future.result()
`
and pip install -r requirements.txt:
requests dugong
Retried today on dev branch , seeing two strange behaviours in log/ticktok.log: one is with multiple datapoints:
`2023-07-20 15:43:14.381 [INFO] [tcp_listener_1] entered epoll_wait() loop, fd=167 2023-07-20 15:43:20.422 [ERROR] [http_10_task_0] Unhandled request: [imestamp": 1689867800, (null) "value": 68, "tags": {"test": "True"}},
The first data point is in bad format, no metric name. Also missing "T" in "imestamp".
Another thing I note is that the tag value "True". Json only supports bool in lower case, true | false. I guess you use string type in the tag value on purpose. Nevertheless, it should be acceptable.
and with only one datapoint:
15:47:23.800 [INFO] [tcp_listener_1] entered epoll_wait() loop, fd=167 2023-07-20 15:47:33.115 [ERROR] [http_10_task_0] Unhandled request: [68053, "value": (null) 23, "tags": {"test": "True"}},
This request also in bad format, "value": (null) 23.
Could you please find out what the two original requests are?
I will add python sample code to help with repro. change BATCH_SIZE to send multiple datapoints.
That would be very nice.
Retried today on dev branch , seeing two strange behaviours in log/ticktok.log: one is with multiple datapoints: `2023-07-20 15:43:14.381 [INFO] [tcp_listener_1] entered epoll_wait() loop, fd=167 2023-07-20 15:43:20.422 [ERROR] [http_10_task_0] Unhandled request: [imestamp": 1689867800, (null) "value": 68, "tags": {"test": "True"}},
The first data point is in bad format, no metric name. Also missing "T" in "imestamp".
We suspect that the http request was sent in two packets by network layer. TT can handle this case as long as the total request size is less than TT tcp.buffer.size.
Another thing I note is that the tag value "True". Json only supports bool in lower case, true | false. I guess you use string type in the tag value on purpose. Nevertheless, it should be acceptable.
and with only one datapoint: 15:47:23.800 [INFO] [tcp_listener_1] entered epoll_wait() loop, fd=167 2023-07-20 15:47:33.115 [ERROR] [http_10_task_0] Unhandled request: [68053, "value": (null) 23, "tags": {"test": "True"}},
This request also in bad format, "value": (null) 23.
We suspect there might be json parsing errors in requests.
Could you please find out what the two original requests are?
@andreish You can run ./admin/log_level http
to enable more logs. Once if you hit the problem again, you can capture original requests in the log.
I will add python sample code to help with repro. change BATCH_SIZE to send multiple datapoints.
That would be very nice.
attached testasync.zip testasync.zip
After enabling http log behavior is a bit different , I send 12497 Posts but receive only 432 responses.
Here is the log: ticktock.log
@andreish I think we know what's going on. Your python code uses http1.1 pipelining, in which a http request can be sent without waiting for response of previous http request. Unfortunately, TT doesn't support http pipeline. Each connection can send a request only after receiving response of a previous request (i.e., sync).
If you look at the logs attached, a network request contained lots of individual put requests with headers. TT only processed the first put since it doesn't support pipelining. That's why you sent 12497 requests but only got 432 responses.
There are threetwo ways to work around currently.
\<TT\>/api-examples/python/http_plain_writer.py
for an example.\<TT\>/api-examples/python/tcp_plain_writer.py
for an example.Please let us know if the above works for you.
For you reference, here is a link to http pipelining in wiki. https://en.wikipedia.org/wiki/HTTP_pipelining
Thank you for looking into it .
I will probably switch to TCP , and move/adapt the code handling http responses.
Btw initial code test against opentsdb handled http pipelining just fine.
I was testing http pipelining against docker run -d --name opentsdb --cpuset-cpus 1-2 -m 4g -h opentsdb -p 4242:4242 -v /opt/opentsdb:/etc/opentsdb petergrace/opentsdb-docker
.
Thank you for looking into it . I will probably switch to TCP , and move/adapt the code handling http responses.
Btw initial code test against opentsdb handled http pipelining just fine. I was testing http pipelining against
docker run -d --name opentsdb --cpuset-cpus 1-2 -m 4g -h opentsdb -p 4242:4242 -v /opt/opentsdb:/etc/opentsdb petergrace/opentsdb-docker
.
OK, thanks for the info. That makes sense for TT to support pipelining in order to be fully compatible with Opentsdb. Let us evaluate the work first.
@andreish We evaluated the work and it may take weeks to implement Http pipelining into TT. Currently we are busy with a major refactoring and http pipelining is not a priority. Please use workarounds suggested above if you could.
Thank you, I will use TCP for now.
at some point it fails to insert a datapoint and returns error
using following snippet to write 1000 POSTS to /api/put
`import asyncio import atexit from dugong import HTTPConnection, AioFuture ... with HTTPConnection(OPENTSDB_HOST, PORT) as conn:
This generator function returns a coroutine that sends
`