tarantool / tarantool-python

Python client library for Tarantool
https://www.tarantool.io
BSD 2-Clause "Simplified" License
101 stars 46 forks source link

Weird error when loading huge table using .select() . #72

Closed buriy closed 8 years ago

buriy commented 8 years ago

When loading a lot of data using .select(), I get (probably, more than 2GB?):

Traceback (most recent call last):
  File "batch.py", line 34, in <module>
    cleanup(demo)
  File "batch.py", line 22, in cleanup
    for x in space.select():
  File "/usr/local/lib/python2.7/dist-packages/tarantool/space.py", line 75, in select
    return self.connection.select(self.space_no, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tarantool/connection.py", line 659, in select
    response = self._send_request(request)
  File "/usr/local/lib/python2.7/dist-packages/tarantool/connection.py", line 283, in _send_request
    request)
  File "/usr/local/lib/python2.7/dist-packages/tarantool/connection.py", line 207, in _send_request_wo_reconnect
    response = Response(self, self._read_response())
  File "/usr/local/lib/python2.7/dist-packages/tarantool/connection.py", line 193, in _read_response
    return self._recv(length)
  File "/usr/local/lib/python2.7/dist-packages/tarantool/connection.py", line 171, in _recv
    tmp = self._socket.recv(to_read)
OverflowError: signed integer is greater than maximum

It's ok to limit large queries, but this is probably a error related to 32bit data field and on 5GB of data it will read only 1GB.

bigbes commented 8 years ago

It looks like python acepts only 4 bytes signed integer as argument to function socket.recv

bigbes commented 8 years ago

https://github.com/python/cpython/blob/2.7/Modules/socketmodule.c#L2510-L2511 :

    if (!PyArg_ParseTuple(args, "i|i:recv", &recvlen, &flags))
        return NULL;

as i've guessed.

In python 3.* It's fixed. They use internal number types for this function:

https://github.com/python/cpython/blob/3.5/Modules/socketmodule.c#L2912-L2913

    if (!PyArg_ParseTuple(args, "n|i:recv", &recvlen, &flags))
        return NULL;

Since it's i - they use 4 bytes signed integer, and it limits recv command to 2^31.

Try to avoid big requests, try to use limit and offset arguments of select command.

buriy commented 8 years ago

Then please raise the error before querying socket with .select() , or change the API. Am I right that in the current API there's no way to know how much data is in the table before trying a .select() ? If entries are blob, any limit=... could fail in the same way. So it's bad API design then.

bigbes commented 8 years ago

So it's bad API design then.

We can take your argmuent and apply it to yourself: if you're trying to query all data on client - it's bad client design OR wrong idea. What are you trying to achieve? Maybe it's better to write stored procedure?

If entries are blob, any limit=... could fail in the same way.

You and only you know what's stored in your instance, so you (and only you) can figure out your preferable limits.

Then please raise the error before querying socket with .select() , or change the API.

That's what I've planned to do.

buriy commented 8 years ago

20 апр. 2016 г. 14:53 пользователь "bigbes" notifications@github.com написал:

So it's bad API design then.

We can take your argmuent and apply it to yourself: if you're trying to query all data on client - it's bad client design OR wrong idea. What are you trying to achieve?

I just did a benchmark, and wanted to list all keys in the table, to remove some of them later. Keys were integers, each key payload was 100kb. Haven't found another easy way.

Maybe it's better to write stored procedure?

I'm not much a fan of lua app logic on a single-core database server, nor want to write lua code for the most basic things. It's easier for me to write my own app server instead if I need any advanced functionality! (It won't have any single-core CPU limitations, could use any DB, etc).

If entries are blob, any limit=... could fail in the same way.

You and only you know what's stored in your instance, so you (and only you) can figure out your preferable limits.

Then please raise the error before querying socket with .select() , or change the API.

That's what I've planned to do.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub