tarantool / tarantool-python

Python client library for Tarantool
https://www.tarantool.io
BSD 2-Clause "Simplified" License
100 stars 48 forks source link

schema: support encoding=None connections #172

Closed Totktonada closed 4 years ago

Totktonada commented 4 years ago

Several different problems are fixed here, but all have the same root. When a connection encoding is None (it is default on Python 2 and may be set explicitly on Python 3), all mp_str values are decoded into bytes, not Unicode strings (note that bytes is alias for str in Python 2). But the database schema parsing code have assumptions that _vspace / _vindex values are Unicode strings.

The resolved problems are the following:

  1. Default encoding in bytes#decode() method is 'ascii', however names in tarantool can contain symbols beyond ASCII symbol table. Set 'utf-8' for names decoding.
  2. Convert all binary values into Unicode strings before parse or store them. This allows further correct accesses to the local schema representation.
  3. Convert binary parameters like space, index or field name into Unicode strings, when a schema is accessed to don't trigger redundant schema refetching.

Those problems are briefly mentioned in 1.

Tested manually with Python 2 and Python 3: my testing tarantool instance has a space with name '©' and after the changes I'm able to connect to it when the connection encoding is set to None. Also I verified that schema is not fetched each time when I do \<connection>.select('©') in Python 2 (where such string literal is str / bytes, not Unicode string).

Added relevant test cases as separate commits within the PR.

Totktonada commented 4 years ago
assert max_depth > 0

@artembo suggested to don't use assert this way: it may be disabled in production use. I agreed and found that SchemaError should be raised here (it is internal error in fact and we'll define it properly in the error hierarhy within #174). I added RecursionError for technical matters: different to_unicode_recursive() calls should give different error messages when the recursion reaches its limit.