questdb / py-questdb-client

Python client for QuestDB InfluxDB Line Protocol
https://py-questdb-client.readthedocs.io
Apache License 2.0
55 stars 10 forks source link

Typing Support (ingress.pyi file) #95

Closed nunnsy closed 1 week ago

nunnsy commented 1 week ago

As mentioned in this issue, the inclusion of a .pyi file makes type checkers and IDEs happier to show function signatures, required types and defaults if they exist.

Unfortauntely, I tried a few pyi generation scripts/modules, and nothing worked flawlessly. Otherwise, I'd have added a script to the setup.py to auto-generate, allowing for a more streamlined process to any updates to the Cython file (ingress.pyx).

For now, this is a quick crack at manually getting the stub file complete using the existing Cython public methods, types and docstrings.

This PR also includes adjusting one of the TOML fields which showed an error as an unexpected type, as well as removing a duplicate function delcaration from the Cython file.

It also removes redundant built logic given the underlying issue has been solved, and adjusts checks for fastparquet due to no 3.13 binaries being available yet.

CLAassistant commented 1 week ago

CLA assistant check
All committers have signed the CLA.

nunnsy commented 1 week ago

@amunra I've realised I've conflated a few things here - please let me know if you'd like them split over different PRs.

amunra commented 1 week ago

@amunra I've realised I've conflated a few things here - please let me know if you'd like them split over different PRs.

No, that's not an issue. My main concern here is that it's painfully easy for the .pyi definitions to go out of date.

They are functionally effectively equivalent to header file in C, but have the downside (unless I'm wrong - and I hope I am) that there's nothing to ensure that they are in sync with the Cython implementation.

nunnsy commented 1 week ago

My main concern here is that it's painfully easy for the .pyi definitions to go out of date.

Agreed. I tried looking for solutions, as mentioned in the initial PR comment, for autocompleting a stub file. I found all solutions, as it stands, have drawbacks, which means they do not produce a reliable file (and in most cases, just error out, even with extreme modification, as all these scripts text/symbol parsing).

They are functionally effectively equivalent to header file in C, but have the downside (unless I'm wrong - and I hope I am) that there's nothing to ensure that they are in sync with the Cython implementation.

I don't think you're wrong 😔

I think the best long-term solution would be to have something like mypy build compatibility of its stubgen to read Cython files, and it does look like there are some people working on it. For the time being though, I'm not sure there's a better solution than manual checks, unless there was a desire to try and check symbols within the build script.

Also, given how little this interface is changing (at least for the time being), I wasn't sure whether it would be ok for now either way?

nunnsy commented 1 week ago

Auto-generation packages I tried:

amunra commented 1 week ago

Auto-generation packages I tried:

* https://github.com/Vizonex/CyStub

* https://github.com/mike-huls/cythonbuilder (the `interface` sub-command)

* https://github.com/RaubCamaioni/CythonPEG

These are the same ones I would have experimented with. cythonbuilder looks like it's the only maintained one.

What issues did you encounter, specifically?

nunnsy commented 1 week ago

Agreed, and good question.

Initially the logs output:

(questdb) daniel@boopers:~/projects/py-questdb-client/src/questdb$ cybuilder interface -v
[cythonbuilder] 16:49:21 cli      148  Generating interface files..
[cythonbuilder] 16:49:21 cli      155  Found 1 to create pyi files for
[cythonbuilder] these 1 pyx files?
(y/n) 
         - /home/daniel/projects/py-questdb-client/src/questdb/ingress.pyxy
[cythonbuilder] 16:49:24 cli      163  Creating 1 pyi files..
[cythonbuilder] 16:49:24 cli      166  Generating pyi files..
[cythonbuilder] 16:49:24 cython_builder 161  [cy_interface] - generating interface files for 1 found pyx files
[cythonbuilder] 16:49:24 cython_builder 169  Creating .pyi for /home/daniel/projects/py-questdb-client/src/questdb/ingress.pyx
[cythonbuilder] 16:49:24 cli      171  build error: string index out of range

To look deeper, I made a throwaway script:

from cythonbuilder.pyigenerator import pyx_to_pyi

with open("./src/questdb/ingress.pyx", "r") as f:
    output = pyx_to_pyi(f)

with open("./src/questdb/ingress.pyi", "w") as f:
    f.writelines(output)

Which resulted in:

Traceback (most recent call last):
  File "grr.py", line 4, in <module>
    output = pyx_to_pyi(f)
  File "/home/daniel/miniconda3/envs/questdb/lib/python3.8/site-packages/cythonbuilder/pyigenerator.py", line 194, in pyx_to_pyi
    all_indentations = [get_line_indentation_spacecount(line=l) for l in pyx_lines]
  File "/home/daniel/miniconda3/envs/questdb/lib/python3.8/site-packages/cythonbuilder/pyigenerator.py", line 194, in <listcomp>
    all_indentations = [get_line_indentation_spacecount(line=l) for l in pyx_lines]
  File "/home/daniel/miniconda3/envs/questdb/lib/python3.8/site-packages/cythonbuilder/pyigenerator.py", line 258, in get_line_indentation_spacecount
    if (line[letter_idx + 1] != ' '):
IndexError: string index out of range

After fixing their counter which doesn't handle a line length of 1 correctly...

def get_line_indentation_spacecount(line: str) -> int:
    """ How many spaces is the provided line indented? """
    spaces: int = 0

    # My code ------
    for c in line:
        if c != ' ':
            break
        spaces += 1

    return spaces
    # End my code ---

    if (len(line) > 0):
        if (line[0] == ' '):
            for letter_idx in range(len(line)):
                if (line[letter_idx + 1] != ' '):
                    spaces = letter_idx + 1
                    break
    return spaces

Another issue crops up:

Traceback (most recent call last):
  File "grr.py", line 4, in <module>
    output = pyx_to_pyi(f)
  File "/home/daniel/miniconda3/envs/questdb/lib/python3.8/site-packages/cythonbuilder/pyigenerator.py", line 200, in pyx_to_pyi
    raise ValueError(f"Found invalid indentation: {indent} not divisible by {spaces_for_one_tab}")
ValueError: Found invalid indentation: 6 not divisible by 4

Removing that check (for the time being), and also modifying their processor to remove lines which are all spaces by adding

    pyx_lines = [l for l in pyx_lines if not l.isspace()]

We do yield a result of the interface file:

"""
import cython
from enum import Enum
from typing import List, Tuple, Dict, Union, Any, Optional, Callable, \
import pathlib
import sys
import os
def _has_gil(gs:pythreadstate**) -> bool:
    ...
def _ensure_doesnt_have_gil(gs:pythreadstate**) -> bool:
    ...
def _ensure_has_gil(gs:pythreadstate**) -> None:
    ...
class IngressErrorCode(Enum):
    """Category of Error."""
    CouldNotResolveAddr = line_sender_error_could_not_resolve_addr
    InvalidApiCall = line_sender_error_invalid_api_call
    SocketError = line_sender_error_socket_error
    InvalidUtf8 = line_sender_error_invalid_utf8
    InvalidName = line_sender_error_invalid_name
    InvalidTimestamp = line_sender_error_invalid_timestamp
    AuthError = line_sender_error_auth_error
    TlsError = line_sender_error_tls_error
    HttpNotSupported = line_sender_error_http_not_supported
    ServerFlushError = line_sender_error_server_flush_error
    ConfigError = line_sender_error_config_error
    BadDataFrame = <int>line_sender_error_server_flush_error + 1
    def __str__(self) -> str:
        ...
class IngressError(Exception):
    """An error whilst using the ``Sender`` or constructing its ``Buffer``."""
    def __init__(self, code, msg):
        ...
    @property
    def code(self) -> IngressErrorCode:
        ...
def c_err_code_to_py(code:line_sender_error_code):
    ...
def c_err_to_code_and_msg(err:line_sender_error*):
    ...
def c_err_to_py(err:line_sender_error*):
    ...
def c_err_to_py_fmt(err:line_sender_error*, fmt:str):
    ...
def _fqn(obj:type) -> str:
    ...
def datetime_to_micros(dt:datetime) -> int64_t:
    ...
def datetime_to_nanos(dt:datetime) -> int64_t:
    ...
class _ServerTimestamp:
    """
    A placeholder value to indicate using a server-generated-timestamp.
    """
    pass
ServerTimestamp = _ServerTimestamp()
class TimestampMicros:
    """
    A timestamp in microseconds since the UNIX epoch (UTC).
    You may construct a ``TimestampMicros`` from an integer or a
    ``datetime.datetime``, or simply call the :func:`TimestampMicros.now`
    method.
    .. code-block:: python
        TimestampMicros.now()
        TimestampMicros(time.time_ns() // 1000)
        TimestampMicros(1657888365426838)
    ``TimestampMicros`` can also be constructed from a ``datetime.datetime``
    object.
    .. code-block:: python
        TimestampMicros.from_datetime(
            datetime.datetime.now(tz=datetime.timezone.utc))
    We recommend that when using ``datetime`` objects, you explicitly pass in
    the timezone to use. This is because ``datetime`` objects without an
    associated timezone are assumed to be in the local timezone and it is easy
    to make mistakes (e.g. passing ``datetime.datetime.utcnow()`` is a likely
    bug).
    """
    cdef int64_t _value
    def __cinit__(self, value: int):
        ...
    @classmethod
    def from_datetime(cls, dt: datetime):
        ...
    @classmethod
    def now(cls):
        ...
    @property
    def value(self) -> int:
        ...
    def __repr__(self):
        ...
class TimestampNanos:
    """
    A timestamp in nanoseconds since the UNIX epoch (UTC).
    You may construct a ``TimestampNanos`` from an integer or a
    ``datetime.datetime``, or simply call the :func:`TimestampNanos.now`
    method.
    .. code-block:: python
        TimestampNanos.now()
        TimestampNanos(time.time_ns())
        TimestampNanos(1657888365426838016)
    ``TimestampNanos`` can also be constructed from a ``datetime`` object.
    .. code-block:: python
        TimestampNanos.from_datetime(
            datetime.datetime.now(tz=datetime.timezone.utc))
    We recommend that when using ``datetime`` objects, you explicitly pass in
    the timezone to use. This is because ``datetime`` objects without an
    associated timezone are assumed to be in the local timezone and it is easy
    to make mistakes (e.g. passing ``datetime.datetime.utcnow()`` is a likely
    bug).
    """
    cdef int64_t _value
    def __cinit__(self, value: int):
        ...
    @classmethod
    def from_datetime(cls, dt: datetime):
        ...
    @classmethod
    def now(cls):
        ...
    @property
    def value(self) -> int:
        ...
    def __repr__(self):
        ...
def -1:
    ...
def _is_tcp_protocol(protocol:line_sender_protocol) -> bool:
    ...
def _is_http_protocol(protocol:line_sender_protocol) -> bool:
    ...
class SenderTransaction:
    """
    A transaction for a specific table.
    Transactions are not supported with ILP/TCP, only ILP/HTTP.
    The sender API can only operate on one transaction at a time.
    To create a transaction:
    .. code_block:: python
        with sender.transaction('table_name') as txn:
            txn.row(..)
            txn.dataframe(..)
    """
    cdef Sender _sender
    cdef str _table_name
    cdef bint _complete
    def __cinit__(self, Sender sender, str table_name):
        ...
    def __enter__(self):
        ...
    def __exit__(self, exc_type, _exc_value, _traceback):
        ...
    def commit(self):
        ...
    def rollback(self):
        ...
class Buffer:
    """
    Construct QuestDB-flavored InfluxDB Line Protocol (ILP) messages.
    The :func:`Buffer.row` method is used to add a row to the buffer.
    You can call this many times.
    .. code-block:: python
        from questdb.ingress import Buffer
        from questdb.ingress import Sender, Buffer
    def __cinit__(self, init_buf_size: int=65536, max_name_len: int=127):
        ...
    def _cinit_impl(self, init_buf_size:size_t, max_name_len:size_t) -> inline:
        ...
    def __dealloc__(self):
        ...
    @property
    def init_buf_size(self) -> int:
        ...
    @property
    def max_name_len(self) -> int:
        ...
    def reserve(self, additional: int):
        ...
    def capacity(self) -> int:
        ...
    def clear(self):
        ...
    def __len__(self) -> int:
        ...
    def __str__(self) -> str:
        ...
    def _to_str(self):
        ...
    def -1:
        ...
    def -1:
        ...
    def _clear_marker(self) -> inline:
        ...
    def -1:
        ...
    def _cleared_b(self):
        ...
    def -1:
        ...
    def -1:
        ...
    def -1:
        ...
    def -1:
        ...
    def -1:
        ...
    def -1:
        ...
    def -1:
        ...
            import pandas as pd
            import questdb.ingress as qi
def _timedelta_to_millis(timedelta:object) -> uint64_t:
    ...
def auto_flush_rows_default(protocol:line_sender_protocol) -> int64_t:
    ...
class TaggedEnum(Enum):
    """
    Base class for tagged enums.
    """
    @property
    def tag(self):
        ...
    @property
    def c_value(self):
        ...
    @classmethod
    def parse(cls, tag):
        ...
class Protocol(TaggedEnum):
    """
    Protocol to use for sending data to QuestDB.
    See :ref:`sender_which_protocol` for more information.
    """
    Tcp = ('tcp', 0)
    Tcps = ('tcps', 1)
    Http = ('http', 2)
    Https = ('https', 3)
    @property
    def tls_enabled(self):
        ...
class TlsCa(TaggedEnum):
    """
    Verification mechanism for the server's certificate.
    Here ``webpki`` refers to the
    `WebPKI library <https://github.com/rustls/webpki-roots>`_ and
    ``os`` refers to the operating system's certificate store.
    See :ref:`sender_conf_tls` for more information.
    """
    WebpkiRoots = ('webpki_roots', line_sender_ca_webpki_roots)
    OsRoots = ('os_roots', line_sender_ca_os_roots)
    WebpkiAndOsRoots = ('webpki_and_os_roots', line_sender_ca_webpki_and_os_roots)
    PemFile = ('pem_file', line_sender_ca_pem_file)
def c_parse_conf_err_to_py(err:questdb_conf_str_parse_err*) -> object:
    ...
class Sender:
    """
    Ingest data into QuestDB.
    See the :ref:`sender` documentation for more information.
    """
    cdef object __weakref__
    cdef line_sender_protocol _c_protocol
    cdef line_sender_opts* _opts
    cdef line_sender* _impl
    cdef Buffer _buffer
    cdef auto_flush_mode_t _auto_flush_mode
    cdef int64_t* _last_flush_ms
    cdef size_t _init_buf_size
    cdef size_t _max_name_len
    cdef bint _in_txn
    cdef void_int _set_sender_fields(
            self,
            qdb_pystr_buf* b,
            object protocol,
            str bind_interface,
            str username,
            str password,
            str token,
            str token_x,
            str token_y,
            object auth_timeout,
            object tls_verify,
            object tls_ca,
            object tls_roots,
            object max_buf_size,
            object retry_timeout,
            object request_min_throughput,
            object request_timeout,
            object auto_flush,
            object auto_flush_rows,
            object auto_flush_bytes,
            object auto_flush_interval,
            object init_buf_size,
            object max_name_len) except -1:
        """
        Set optional parameters for the sender.
        """
        cdef line_sender_error* err = NULL
        cdef str user_agent = 'questdb/python/' + VERSION
        cdef line_sender_utf8 c_user_agent
        cdef line_sender_utf8 c_bind_interface
        cdef line_sender_utf8 c_username
        cdef line_sender_utf8 c_password
        cdef line_sender_utf8 c_token
        cdef line_sender_utf8 c_token_x
        cdef line_sender_utf8 c_token_y
        cdef uint64_t c_auth_timeout
        cdef bint c_tls_verify
        cdef line_sender_ca c_tls_ca
        cdef line_sender_utf8 c_tls_roots
        cdef uint64_t c_max_buf_size
        cdef uint64_t c_retry_timeout
        cdef uint64_t c_request_min_throughput
        cdef uint64_t c_request_timeout
        self._c_protocol = protocol.c_value
        str_to_utf8(b, <PyObject*>user_agent, &c_user_agent)
        if not line_sender_opts_user_agent(self._opts, c_user_agent, &err):
            raise c_err_to_py(err)
        if bind_interface is not None:
            str_to_utf8(b, <PyObject*>bind_interface, &c_bind_interface)
            if not line_sender_opts_bind_interface(self._opts, c_bind_interface, &err):
                raise c_err_to_py(err)
        if username is not None:
            str_to_utf8(b, <PyObject*>username, &c_username)
            if not line_sender_opts_username(self._opts, c_username, &err):
                raise c_err_to_py(err)
        if password is not None:
            str_to_utf8(b, <PyObject*>password, &c_password)
            if not line_sender_opts_password(self._opts, c_password, &err):
                raise c_err_to_py(err)
        if token is not None:
            str_to_utf8(b, <PyObject*>token, &c_token)
            if not line_sender_opts_token(self._opts, c_token, &err):
                raise c_err_to_py(err)
        if token_x is not None:
            str_to_utf8(b, <PyObject*>token_x, &c_token_x)
            if not line_sender_opts_token_x(self._opts, c_token_x, &err):
                raise c_err_to_py(err)
        if token_y is not None:
            str_to_utf8(b, <PyObject*>token_y, &c_token_y)
            if not line_sender_opts_token_y(self._opts, c_token_y, &err):
                raise c_err_to_py(err)
        if auth_timeout is not None:
            if isinstance(auth_timeout, int):
                c_auth_timeout = auth_timeout
            elif isinstance(auth_timeout, timedelta):
                c_auth_timeout = _timedelta_to_millis(auth_timeout)
            else:
                raise TypeError(
                    '"auth_timeout" must be an int or a timedelta, '
                    f'not {_fqn(type(auth_timeout))}')
            if not line_sender_opts_auth_timeout(self._opts, c_auth_timeout, &err):
                raise c_err_to_py(err)
        if tls_verify is not None:
            if (tls_verify is True) or (tls_verify == 'on'):
                c_tls_verify = True
            elif (tls_verify is False) or (tls_verify == 'unsafe_off'):
                c_tls_verify = False
            else:
                raise ValueError(
                    '"tls_verify" must be a bool, "on" or "unsafe_off", '
                    f'not {tls_verify!r}')
            if not line_sender_opts_tls_verify(self._opts, c_tls_verify, &err):
                raise c_err_to_py(err)
        if tls_roots is not None:
            tls_roots = str(tls_roots)
            str_to_utf8(b, <PyObject*>tls_roots, &c_tls_roots)
            if not line_sender_opts_tls_roots(self._opts, c_tls_roots, &err):
                raise c_err_to_py(err)
        if tls_ca is not None:
            c_tls_ca = TlsCa.parse(tls_ca).c_value
            if not line_sender_opts_tls_ca(self._opts, c_tls_ca, &err):
                raise c_err_to_py(err)
        elif protocol.tls_enabled and tls_roots is None:
            c_tls_ca = line_sender_ca_webpki_and_os_roots
            if not line_sender_opts_tls_ca(self._opts, c_tls_ca, &err):
                raise c_err_to_py(err)
        if max_buf_size is not None:
            c_max_buf_size = max_buf_size
            if not line_sender_opts_max_buf_size(self._opts, c_max_buf_size, &err):
                raise c_err_to_py(err)
        if retry_timeout is not None:
            if isinstance(retry_timeout, int):
                c_retry_timeout = retry_timeout
                if not line_sender_opts_retry_timeout(self._opts, c_retry_timeout, &err):
                    raise c_err_to_py(err)
            elif isinstance(retry_timeout, timedelta):
                c_retry_timeout = _timedelta_to_millis(retry_timeout)
                if not line_sender_opts_retry_timeout(self._opts, c_retry_timeout, &err):
                    raise c_err_to_py(err)
            else:
                raise TypeError(
                    '"retry_timeout" must be an int or a timedelta, '
                    f'not {_fqn(type(retry_timeout))}')
        if request_min_throughput is not None:
            c_request_min_throughput = request_min_throughput
            if not line_sender_opts_request_min_throughput(self._opts, c_request_min_throughput, &err):
                raise c_err_to_py(err)
        if request_timeout is not None:
            if isinstance(request_timeout, int):
                c_request_timeout = request_timeout
                if not line_sender_opts_request_timeout(self._opts, c_request_timeout, &err):
                    raise c_err_to_py(err)
            elif isinstance(request_timeout, timedelta):
                c_request_timeout = _timedelta_to_millis(request_timeout)
                if not line_sender_opts_request_timeout(self._opts, c_request_timeout, &err):
                    raise c_err_to_py(err)
            else:
                raise TypeError(
                    '"request_timeout" must be an int or a timedelta, '
                    f'not {_fqn(type(request_timeout))}')
        _parse_auto_flush(
            self._c_protocol,
            auto_flush,
            auto_flush_rows,
            auto_flush_bytes,
            auto_flush_interval,
            &self._auto_flush_mode)
        self._init_buf_size = init_buf_size or 65536
        self._max_name_len = max_name_len or 127
        self._buffer = Buffer(
            init_buf_size=self._init_buf_size,
            max_name_len=self._max_name_len)
        self._last_flush_ms = <int64_t*>calloc(1, sizeof(int64_t))
    def __cinit__(self):
        ...
    @staticmethod
    @staticmethod
    def new_buffer(self):
        ...
    @property
    def init_buf_size(self) -> int:
        ...
    @property
    def max_name_len(self) -> int:
        ...
    @property
    def auto_flush(self) -> bint:
        ...
    @property
    def auto_flush_rows(self) -> Optional[int]:
        ...
    @property
    def auto_flush_bytes(self) -> Optional[int]:
        ...
    @property
    def auto_flush_interval(self) -> Optional[timedelta]:
        ...
    def establish(self):
        ...
    def __enter__(self) -> Sender:
        ...
    def __str__(self) -> str:
        ...
    def __len__(self) -> int:
        ...
    def transaction(self, table_name: str):
        ...
            import pandas as pd
            import questdb.ingress as qi
    def _close(self):
        ...
    def close(self, flush:bool=True):
        ...
    def __exit__(self, exc_type, _exc_val, _exc_tb):
        ...
    def __dealloc__(self):
        ...

It was at this point, I figured I was facing an uphill battle with their parser, so opted to go the manual route, at least on my fork to allow for easier dev on my side.

Hope some of this info is helpful

amunra commented 1 week ago

I've just tried it too. It's not going to have decent mileage if it isn't using the Cython module as parsing. Indentation will be the least of its problems.