Closed nunnsy closed 1 week ago
@amunra I've realised I've conflated a few things here - please let me know if you'd like them split over different PRs.
@amunra I've realised I've conflated a few things here - please let me know if you'd like them split over different PRs.
No, that's not an issue.
My main concern here is that it's painfully easy for the .pyi
definitions to go out of date.
They are functionally effectively equivalent to header file in C, but have the downside (unless I'm wrong - and I hope I am) that there's nothing to ensure that they are in sync with the Cython implementation.
My main concern here is that it's painfully easy for the .pyi definitions to go out of date.
Agreed. I tried looking for solutions, as mentioned in the initial PR comment, for autocompleting a stub file. I found all solutions, as it stands, have drawbacks, which means they do not produce a reliable file (and in most cases, just error out, even with extreme modification, as all these scripts text/symbol parsing).
They are functionally effectively equivalent to header file in C, but have the downside (unless I'm wrong - and I hope I am) that there's nothing to ensure that they are in sync with the Cython implementation.
I don't think you're wrong 😔
I think the best long-term solution would be to have something like mypy
build compatibility of its stubgen
to read Cython files, and it does look like there are some people working on it. For the time being though, I'm not sure there's a better solution than manual checks, unless there was a desire to try and check symbols within the build script.
Also, given how little this interface is changing (at least for the time being), I wasn't sure whether it would be ok for now either way?
Auto-generation packages I tried:
interface
sub-command)Auto-generation packages I tried:
* https://github.com/Vizonex/CyStub * https://github.com/mike-huls/cythonbuilder (the `interface` sub-command) * https://github.com/RaubCamaioni/CythonPEG
These are the same ones I would have experimented with.
cythonbuilder
looks like it's the only maintained one.
What issues did you encounter, specifically?
Agreed, and good question.
Initially the logs output:
(questdb) daniel@boopers:~/projects/py-questdb-client/src/questdb$ cybuilder interface -v
[cythonbuilder] 16:49:21 cli 148 Generating interface files..
[cythonbuilder] 16:49:21 cli 155 Found 1 to create pyi files for
[cythonbuilder] these 1 pyx files?
(y/n)
- /home/daniel/projects/py-questdb-client/src/questdb/ingress.pyxy
[cythonbuilder] 16:49:24 cli 163 Creating 1 pyi files..
[cythonbuilder] 16:49:24 cli 166 Generating pyi files..
[cythonbuilder] 16:49:24 cython_builder 161 [cy_interface] - generating interface files for 1 found pyx files
[cythonbuilder] 16:49:24 cython_builder 169 Creating .pyi for /home/daniel/projects/py-questdb-client/src/questdb/ingress.pyx
[cythonbuilder] 16:49:24 cli 171 build error: string index out of range
To look deeper, I made a throwaway script:
from cythonbuilder.pyigenerator import pyx_to_pyi
with open("./src/questdb/ingress.pyx", "r") as f:
output = pyx_to_pyi(f)
with open("./src/questdb/ingress.pyi", "w") as f:
f.writelines(output)
Which resulted in:
Traceback (most recent call last):
File "grr.py", line 4, in <module>
output = pyx_to_pyi(f)
File "/home/daniel/miniconda3/envs/questdb/lib/python3.8/site-packages/cythonbuilder/pyigenerator.py", line 194, in pyx_to_pyi
all_indentations = [get_line_indentation_spacecount(line=l) for l in pyx_lines]
File "/home/daniel/miniconda3/envs/questdb/lib/python3.8/site-packages/cythonbuilder/pyigenerator.py", line 194, in <listcomp>
all_indentations = [get_line_indentation_spacecount(line=l) for l in pyx_lines]
File "/home/daniel/miniconda3/envs/questdb/lib/python3.8/site-packages/cythonbuilder/pyigenerator.py", line 258, in get_line_indentation_spacecount
if (line[letter_idx + 1] != ' '):
IndexError: string index out of range
After fixing their counter which doesn't handle a line length of 1 correctly...
def get_line_indentation_spacecount(line: str) -> int:
""" How many spaces is the provided line indented? """
spaces: int = 0
# My code ------
for c in line:
if c != ' ':
break
spaces += 1
return spaces
# End my code ---
if (len(line) > 0):
if (line[0] == ' '):
for letter_idx in range(len(line)):
if (line[letter_idx + 1] != ' '):
spaces = letter_idx + 1
break
return spaces
Another issue crops up:
Traceback (most recent call last):
File "grr.py", line 4, in <module>
output = pyx_to_pyi(f)
File "/home/daniel/miniconda3/envs/questdb/lib/python3.8/site-packages/cythonbuilder/pyigenerator.py", line 200, in pyx_to_pyi
raise ValueError(f"Found invalid indentation: {indent} not divisible by {spaces_for_one_tab}")
ValueError: Found invalid indentation: 6 not divisible by 4
Removing that check (for the time being), and also modifying their processor to remove lines which are all spaces by adding
pyx_lines = [l for l in pyx_lines if not l.isspace()]
We do yield a result of the interface file:
"""
import cython
from enum import Enum
from typing import List, Tuple, Dict, Union, Any, Optional, Callable, \
import pathlib
import sys
import os
def _has_gil(gs:pythreadstate**) -> bool:
...
def _ensure_doesnt_have_gil(gs:pythreadstate**) -> bool:
...
def _ensure_has_gil(gs:pythreadstate**) -> None:
...
class IngressErrorCode(Enum):
"""Category of Error."""
CouldNotResolveAddr = line_sender_error_could_not_resolve_addr
InvalidApiCall = line_sender_error_invalid_api_call
SocketError = line_sender_error_socket_error
InvalidUtf8 = line_sender_error_invalid_utf8
InvalidName = line_sender_error_invalid_name
InvalidTimestamp = line_sender_error_invalid_timestamp
AuthError = line_sender_error_auth_error
TlsError = line_sender_error_tls_error
HttpNotSupported = line_sender_error_http_not_supported
ServerFlushError = line_sender_error_server_flush_error
ConfigError = line_sender_error_config_error
BadDataFrame = <int>line_sender_error_server_flush_error + 1
def __str__(self) -> str:
...
class IngressError(Exception):
"""An error whilst using the ``Sender`` or constructing its ``Buffer``."""
def __init__(self, code, msg):
...
@property
def code(self) -> IngressErrorCode:
...
def c_err_code_to_py(code:line_sender_error_code):
...
def c_err_to_code_and_msg(err:line_sender_error*):
...
def c_err_to_py(err:line_sender_error*):
...
def c_err_to_py_fmt(err:line_sender_error*, fmt:str):
...
def _fqn(obj:type) -> str:
...
def datetime_to_micros(dt:datetime) -> int64_t:
...
def datetime_to_nanos(dt:datetime) -> int64_t:
...
class _ServerTimestamp:
"""
A placeholder value to indicate using a server-generated-timestamp.
"""
pass
ServerTimestamp = _ServerTimestamp()
class TimestampMicros:
"""
A timestamp in microseconds since the UNIX epoch (UTC).
You may construct a ``TimestampMicros`` from an integer or a
``datetime.datetime``, or simply call the :func:`TimestampMicros.now`
method.
.. code-block:: python
TimestampMicros.now()
TimestampMicros(time.time_ns() // 1000)
TimestampMicros(1657888365426838)
``TimestampMicros`` can also be constructed from a ``datetime.datetime``
object.
.. code-block:: python
TimestampMicros.from_datetime(
datetime.datetime.now(tz=datetime.timezone.utc))
We recommend that when using ``datetime`` objects, you explicitly pass in
the timezone to use. This is because ``datetime`` objects without an
associated timezone are assumed to be in the local timezone and it is easy
to make mistakes (e.g. passing ``datetime.datetime.utcnow()`` is a likely
bug).
"""
cdef int64_t _value
def __cinit__(self, value: int):
...
@classmethod
def from_datetime(cls, dt: datetime):
...
@classmethod
def now(cls):
...
@property
def value(self) -> int:
...
def __repr__(self):
...
class TimestampNanos:
"""
A timestamp in nanoseconds since the UNIX epoch (UTC).
You may construct a ``TimestampNanos`` from an integer or a
``datetime.datetime``, or simply call the :func:`TimestampNanos.now`
method.
.. code-block:: python
TimestampNanos.now()
TimestampNanos(time.time_ns())
TimestampNanos(1657888365426838016)
``TimestampNanos`` can also be constructed from a ``datetime`` object.
.. code-block:: python
TimestampNanos.from_datetime(
datetime.datetime.now(tz=datetime.timezone.utc))
We recommend that when using ``datetime`` objects, you explicitly pass in
the timezone to use. This is because ``datetime`` objects without an
associated timezone are assumed to be in the local timezone and it is easy
to make mistakes (e.g. passing ``datetime.datetime.utcnow()`` is a likely
bug).
"""
cdef int64_t _value
def __cinit__(self, value: int):
...
@classmethod
def from_datetime(cls, dt: datetime):
...
@classmethod
def now(cls):
...
@property
def value(self) -> int:
...
def __repr__(self):
...
def -1:
...
def _is_tcp_protocol(protocol:line_sender_protocol) -> bool:
...
def _is_http_protocol(protocol:line_sender_protocol) -> bool:
...
class SenderTransaction:
"""
A transaction for a specific table.
Transactions are not supported with ILP/TCP, only ILP/HTTP.
The sender API can only operate on one transaction at a time.
To create a transaction:
.. code_block:: python
with sender.transaction('table_name') as txn:
txn.row(..)
txn.dataframe(..)
"""
cdef Sender _sender
cdef str _table_name
cdef bint _complete
def __cinit__(self, Sender sender, str table_name):
...
def __enter__(self):
...
def __exit__(self, exc_type, _exc_value, _traceback):
...
def commit(self):
...
def rollback(self):
...
class Buffer:
"""
Construct QuestDB-flavored InfluxDB Line Protocol (ILP) messages.
The :func:`Buffer.row` method is used to add a row to the buffer.
You can call this many times.
.. code-block:: python
from questdb.ingress import Buffer
from questdb.ingress import Sender, Buffer
def __cinit__(self, init_buf_size: int=65536, max_name_len: int=127):
...
def _cinit_impl(self, init_buf_size:size_t, max_name_len:size_t) -> inline:
...
def __dealloc__(self):
...
@property
def init_buf_size(self) -> int:
...
@property
def max_name_len(self) -> int:
...
def reserve(self, additional: int):
...
def capacity(self) -> int:
...
def clear(self):
...
def __len__(self) -> int:
...
def __str__(self) -> str:
...
def _to_str(self):
...
def -1:
...
def -1:
...
def _clear_marker(self) -> inline:
...
def -1:
...
def _cleared_b(self):
...
def -1:
...
def -1:
...
def -1:
...
def -1:
...
def -1:
...
def -1:
...
def -1:
...
import pandas as pd
import questdb.ingress as qi
def _timedelta_to_millis(timedelta:object) -> uint64_t:
...
def auto_flush_rows_default(protocol:line_sender_protocol) -> int64_t:
...
class TaggedEnum(Enum):
"""
Base class for tagged enums.
"""
@property
def tag(self):
...
@property
def c_value(self):
...
@classmethod
def parse(cls, tag):
...
class Protocol(TaggedEnum):
"""
Protocol to use for sending data to QuestDB.
See :ref:`sender_which_protocol` for more information.
"""
Tcp = ('tcp', 0)
Tcps = ('tcps', 1)
Http = ('http', 2)
Https = ('https', 3)
@property
def tls_enabled(self):
...
class TlsCa(TaggedEnum):
"""
Verification mechanism for the server's certificate.
Here ``webpki`` refers to the
`WebPKI library <https://github.com/rustls/webpki-roots>`_ and
``os`` refers to the operating system's certificate store.
See :ref:`sender_conf_tls` for more information.
"""
WebpkiRoots = ('webpki_roots', line_sender_ca_webpki_roots)
OsRoots = ('os_roots', line_sender_ca_os_roots)
WebpkiAndOsRoots = ('webpki_and_os_roots', line_sender_ca_webpki_and_os_roots)
PemFile = ('pem_file', line_sender_ca_pem_file)
def c_parse_conf_err_to_py(err:questdb_conf_str_parse_err*) -> object:
...
class Sender:
"""
Ingest data into QuestDB.
See the :ref:`sender` documentation for more information.
"""
cdef object __weakref__
cdef line_sender_protocol _c_protocol
cdef line_sender_opts* _opts
cdef line_sender* _impl
cdef Buffer _buffer
cdef auto_flush_mode_t _auto_flush_mode
cdef int64_t* _last_flush_ms
cdef size_t _init_buf_size
cdef size_t _max_name_len
cdef bint _in_txn
cdef void_int _set_sender_fields(
self,
qdb_pystr_buf* b,
object protocol,
str bind_interface,
str username,
str password,
str token,
str token_x,
str token_y,
object auth_timeout,
object tls_verify,
object tls_ca,
object tls_roots,
object max_buf_size,
object retry_timeout,
object request_min_throughput,
object request_timeout,
object auto_flush,
object auto_flush_rows,
object auto_flush_bytes,
object auto_flush_interval,
object init_buf_size,
object max_name_len) except -1:
"""
Set optional parameters for the sender.
"""
cdef line_sender_error* err = NULL
cdef str user_agent = 'questdb/python/' + VERSION
cdef line_sender_utf8 c_user_agent
cdef line_sender_utf8 c_bind_interface
cdef line_sender_utf8 c_username
cdef line_sender_utf8 c_password
cdef line_sender_utf8 c_token
cdef line_sender_utf8 c_token_x
cdef line_sender_utf8 c_token_y
cdef uint64_t c_auth_timeout
cdef bint c_tls_verify
cdef line_sender_ca c_tls_ca
cdef line_sender_utf8 c_tls_roots
cdef uint64_t c_max_buf_size
cdef uint64_t c_retry_timeout
cdef uint64_t c_request_min_throughput
cdef uint64_t c_request_timeout
self._c_protocol = protocol.c_value
str_to_utf8(b, <PyObject*>user_agent, &c_user_agent)
if not line_sender_opts_user_agent(self._opts, c_user_agent, &err):
raise c_err_to_py(err)
if bind_interface is not None:
str_to_utf8(b, <PyObject*>bind_interface, &c_bind_interface)
if not line_sender_opts_bind_interface(self._opts, c_bind_interface, &err):
raise c_err_to_py(err)
if username is not None:
str_to_utf8(b, <PyObject*>username, &c_username)
if not line_sender_opts_username(self._opts, c_username, &err):
raise c_err_to_py(err)
if password is not None:
str_to_utf8(b, <PyObject*>password, &c_password)
if not line_sender_opts_password(self._opts, c_password, &err):
raise c_err_to_py(err)
if token is not None:
str_to_utf8(b, <PyObject*>token, &c_token)
if not line_sender_opts_token(self._opts, c_token, &err):
raise c_err_to_py(err)
if token_x is not None:
str_to_utf8(b, <PyObject*>token_x, &c_token_x)
if not line_sender_opts_token_x(self._opts, c_token_x, &err):
raise c_err_to_py(err)
if token_y is not None:
str_to_utf8(b, <PyObject*>token_y, &c_token_y)
if not line_sender_opts_token_y(self._opts, c_token_y, &err):
raise c_err_to_py(err)
if auth_timeout is not None:
if isinstance(auth_timeout, int):
c_auth_timeout = auth_timeout
elif isinstance(auth_timeout, timedelta):
c_auth_timeout = _timedelta_to_millis(auth_timeout)
else:
raise TypeError(
'"auth_timeout" must be an int or a timedelta, '
f'not {_fqn(type(auth_timeout))}')
if not line_sender_opts_auth_timeout(self._opts, c_auth_timeout, &err):
raise c_err_to_py(err)
if tls_verify is not None:
if (tls_verify is True) or (tls_verify == 'on'):
c_tls_verify = True
elif (tls_verify is False) or (tls_verify == 'unsafe_off'):
c_tls_verify = False
else:
raise ValueError(
'"tls_verify" must be a bool, "on" or "unsafe_off", '
f'not {tls_verify!r}')
if not line_sender_opts_tls_verify(self._opts, c_tls_verify, &err):
raise c_err_to_py(err)
if tls_roots is not None:
tls_roots = str(tls_roots)
str_to_utf8(b, <PyObject*>tls_roots, &c_tls_roots)
if not line_sender_opts_tls_roots(self._opts, c_tls_roots, &err):
raise c_err_to_py(err)
if tls_ca is not None:
c_tls_ca = TlsCa.parse(tls_ca).c_value
if not line_sender_opts_tls_ca(self._opts, c_tls_ca, &err):
raise c_err_to_py(err)
elif protocol.tls_enabled and tls_roots is None:
c_tls_ca = line_sender_ca_webpki_and_os_roots
if not line_sender_opts_tls_ca(self._opts, c_tls_ca, &err):
raise c_err_to_py(err)
if max_buf_size is not None:
c_max_buf_size = max_buf_size
if not line_sender_opts_max_buf_size(self._opts, c_max_buf_size, &err):
raise c_err_to_py(err)
if retry_timeout is not None:
if isinstance(retry_timeout, int):
c_retry_timeout = retry_timeout
if not line_sender_opts_retry_timeout(self._opts, c_retry_timeout, &err):
raise c_err_to_py(err)
elif isinstance(retry_timeout, timedelta):
c_retry_timeout = _timedelta_to_millis(retry_timeout)
if not line_sender_opts_retry_timeout(self._opts, c_retry_timeout, &err):
raise c_err_to_py(err)
else:
raise TypeError(
'"retry_timeout" must be an int or a timedelta, '
f'not {_fqn(type(retry_timeout))}')
if request_min_throughput is not None:
c_request_min_throughput = request_min_throughput
if not line_sender_opts_request_min_throughput(self._opts, c_request_min_throughput, &err):
raise c_err_to_py(err)
if request_timeout is not None:
if isinstance(request_timeout, int):
c_request_timeout = request_timeout
if not line_sender_opts_request_timeout(self._opts, c_request_timeout, &err):
raise c_err_to_py(err)
elif isinstance(request_timeout, timedelta):
c_request_timeout = _timedelta_to_millis(request_timeout)
if not line_sender_opts_request_timeout(self._opts, c_request_timeout, &err):
raise c_err_to_py(err)
else:
raise TypeError(
'"request_timeout" must be an int or a timedelta, '
f'not {_fqn(type(request_timeout))}')
_parse_auto_flush(
self._c_protocol,
auto_flush,
auto_flush_rows,
auto_flush_bytes,
auto_flush_interval,
&self._auto_flush_mode)
self._init_buf_size = init_buf_size or 65536
self._max_name_len = max_name_len or 127
self._buffer = Buffer(
init_buf_size=self._init_buf_size,
max_name_len=self._max_name_len)
self._last_flush_ms = <int64_t*>calloc(1, sizeof(int64_t))
def __cinit__(self):
...
@staticmethod
@staticmethod
def new_buffer(self):
...
@property
def init_buf_size(self) -> int:
...
@property
def max_name_len(self) -> int:
...
@property
def auto_flush(self) -> bint:
...
@property
def auto_flush_rows(self) -> Optional[int]:
...
@property
def auto_flush_bytes(self) -> Optional[int]:
...
@property
def auto_flush_interval(self) -> Optional[timedelta]:
...
def establish(self):
...
def __enter__(self) -> Sender:
...
def __str__(self) -> str:
...
def __len__(self) -> int:
...
def transaction(self, table_name: str):
...
import pandas as pd
import questdb.ingress as qi
def _close(self):
...
def close(self, flush:bool=True):
...
def __exit__(self, exc_type, _exc_val, _exc_tb):
...
def __dealloc__(self):
...
It was at this point, I figured I was facing an uphill battle with their parser, so opted to go the manual route, at least on my fork to allow for easier dev on my side.
Hope some of this info is helpful
I've just tried it too. It's not going to have decent mileage if it isn't using the Cython
module as parsing. Indentation will be the least of its problems.
As mentioned in this issue, the inclusion of a
.pyi
file makes type checkers and IDEs happier to show function signatures, required types and defaults if they exist.Unfortauntely, I tried a few
pyi
generation scripts/modules, and nothing worked flawlessly. Otherwise, I'd have added a script to thesetup.py
to auto-generate, allowing for a more streamlined process to any updates to the Cython file (ingress.pyx
).For now, this is a quick crack at manually getting the stub file complete using the existing Cython public methods, types and docstrings.
This PR also includes adjusting one of the TOML fields which showed an error as an unexpected type, as well as removing a duplicate function delcaration from the Cython file.
It also removes redundant built logic given the underlying issue has been solved, and adjusts checks for
fastparquet
due to no 3.13 binaries being available yet.