surrealdb / surrealdb.py

SurrealDB SDK for Python
https://surrealdb.com
Apache License 2.0
181 stars 53 forks source link

Updated code formatting and lint (latest `ruff`, `mypy`) #91

Closed alexander-beedie closed 6 months ago

alexander-beedie commented 6 months ago

What is the motivation?

Was taking a look at the SurrealDB Python API, with a view to possibly adding some degree of Polars[^1] support for the async client/driver (I am one of the Polars devs), and realised that the lint/formatting could be made faster & brought up to date. Seemed like a good way to both look through the code and make an initial contribution πŸ˜ŽπŸ‘

Type of Change

What does this change do?

Formatting differences

The formatting is (as you'd hope!) essentially the same, with most differences being marginal (such as an initial line break after the leading """ inside multiline docstrings). Some additional lines longer than the 88-char limit were found/fixed. For consistency, applied the formatter to the "tests" and "surrealdb" dirs, and enabled code formatting in docstrings.

Additional linting

Despite the substantial increase in the (potential) lint rules now covered, only a small handful of minor tidy-ups and Exception improvements were actually flagged.

Rules enabled before:

["I","D","N","UP"]

Rules enabled after:

["B","D","E","EM","F","FA","FBT001","I","N","PIE","PT",
 "PTH","RUF","SIM","TCH","TD","TID","TRY","UP","W"]

Exception improvements:

try:
    ...
except Exception as e:
    raise SurrealDbError(e) from None   # << the "from None" is new

Where a SurrealDbError is raised from within another Exception a "from None" designation[^3] is now added. The result is that the same error, with the same message, is raised, but without the unnecessary/misleading "During handling of the above exception, another exception occurred" context. This noticeably improves signal-to-noise when dealing with Exception traces.

Also:

A few super calls were also automatically simplified:

# before
instance = super(ConnectionController, cls).__call__(*args, **kwargs)
# after
instance = super().__call__(*args, **kwargs)

What is your testing strategy?

Is this related to any issues?

Have you read the Contributing Guidelines?

[^1]: Polars: https://github.com/pola-rs/polars [^2]: Ruff formatter: https://docs.astral.sh/ruff/formatter/ [^3]: PEP409: https://peps.python.org/pep-0409/#proposal

maxwellflitton commented 6 months ago

@alexander-beedie thanks for the pull request. Looks good. Out of curiosity, what type of things are you planning to do with surrealDB.py and polars? A lot of friends have switched from pandas to polars and are amazed at the speed increases they are getting.

alexander-beedie commented 6 months ago

@alexander-beedie thanks for the pull request. Looks good. Out of curiosity, what type of things are you planning to do with surrealDB.py and polars? A lot of friends have switched from pandas to polars and are amazed at the speed increases they are getting.

Nice to hear!

Initially I'm just looking to allow use of a Surreal client as a connection-equivalent in the Polars read_database function. (If SurrealDB adds Arrow export support then we could have some really efficient integration here, otherwise we'll have to eat the expensive intermediate Python materialisation ;)

Experimenting on a local branch, I have the following test working already, which builds on the about-to-merge Polars support^1 for async drivers that I committed yesterday:

from surrealdb import Surreal
import polars as pl
import asyncio

async def get_frame(query: str) -> pl.DataFrame:
    async with Surreal("ws://localhost:8000/rpc") as client:
        await client.use("test", "test")
        return pl.read_database(query=query, connection=client)

asyncio.run(get_frame("SELECT * FROM person LIMIT 1"))

# shape: (1, 5)
# β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
# β”‚ id                          ┆ marketing ┆ pass   ┆ tags                    ┆ user         β”‚
# β”‚ ---                         ┆ ---       ┆ ---    ┆ ---                     ┆ ---          β”‚
# β”‚ str                         ┆ bool      ┆ str    ┆ list[str]               ┆ str          β”‚
# β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•ͺ═══════════β•ͺ════════β•ͺ═════════════════════════β•ͺ══════════════║
# β”‚ person:fvmbtxcv79dveedch4vk ┆ false     ┆ 123456 ┆ ["polars", "surrealdb"] ┆ Stroop Wafel β”‚
# β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

I added KuzuDB support^2 to Polars (along with polars DataFrame export on their side) recently, and am interested generally in extending what we can accept beyond the standard relational DBs, in order to unlock some more interesting use-cases.

maxwellflitton commented 6 months ago

Hey! Looping @CarolineMorton who is working on a project with SurrealDB on creating realistic synthetic data for healthcare research. She has used polars a fair bit in the past, and I know she wants the output of the synthetic data to be a polars df.

CarolineMorton commented 6 months ago

Hey @alexander-beedie

This is great. Thanks for the @ @maxwellflitton.

Thank you for your efforts on integrating SurrealDB with Polars. My project involves importing significant reference data (SNOMED/ICD10) into SurrealDB to leverage its query functionality for navigating through a graph of related information. This includes asynchronously sampling from these nodes and compiling the results into a structure similar to a table, with a preference for exporting these as Polars DataFrames. The use of Arrow in Polars for data handling is a key factor for this preference.

The work to allow a Surreal client to act as a connection in the Polars read_database function sounds really promising. The prospect of SurrealDB incorporating Arrow export capabilities could significantly benefit the "big data" community. Would you be open to discussing potential collaboration on this front?

Cheers

AlexFrid commented 6 months ago

Thanks for the PR @alexander-beedie! πŸŽ‰

Adding Polars and Arrow support would be phenomenal πŸš€ It's something that's been on my personal wishlist as well, so happy to support in that where I can.

alexander-beedie commented 6 months ago

Thank you for your efforts on integrating SurrealDB with Polars. My project involves importing significant reference data (SNOMED/ICD10) into SurrealDB to leverage its query functionality for navigating through a graph of related information. This includes asynchronously sampling from these nodes and compiling the results into a structure similar to a table, with a preference for exporting these as Polars DataFrames. The use of Arrow in Polars for data handling is a key factor for this preference.

@CarolineMorton, FYI the initial commit just landed: https://github.com/pola-rs/polars/pull/15269. I'd expect us to make a release that contains the new code sometime in the coming week; would be very interested in your feedback and/or feature requests (or bug reports, but lets hope there aren't any ;) as Polars support for async database drivers is also new (landing in 0.20.17).

The work to allow a Surreal client to act as a connection in the Polars read_database function sounds really promising. The prospect of SurrealDB incorporating Arrow export capabilities could significantly benefit the "big data" community. Would you be open to discussing potential collaboration on this front?

Sure, though any Surreal β†’ Arrow engineering work would realistically have to come from the Surreal folks - I'm more than happy to help test/integrate with Polars should such support arrive though! I think it would be a big win on many fronts 😎

alexander-beedie commented 6 months ago

@maxwellflitton: Anything else I need to do to help get this one merged? I already have another small PR pending that will deal with a minor Exception-related issue that I spotted while testing the new Polars read_database support ;)

alexander-beedie commented 6 months ago

FYI: @CarolineMorton (and @AlexFrid / @maxwellflitton ;) - https://github.com/pola-rs/polars/pull/15269 just merged. We'll have our initial SurrealDB support, validated locally against both both Surreal ("ws") and SurrealHTTP ("http"), available in our (imminent) 0.20.17 release.

If the Surreal Python SDK/driver is eventually able to return Arrow data then we'll be able to take a significant step forward performance-wise (true zero-copy to Polars), but this should still be a good enabler for plenty of use-casesπŸ‘Œ

Once it's released (likely in the next few days), let me know if you try it out and see any rough spots; async driver support as a whole is new to this release, and it will be good to get some "real world" testing (and feel free to tag me directly in any feedback).

CarolineMorton commented 6 months ago

Fantastic @alexander-beedie - Will check out the polars changes. Excited about moving this all forward.

@AlexFrid Shall we have a chat about adding arrow support some time

alexander-beedie commented 6 months ago

βœ… Released! https://github.com/pola-rs/polars/releases/tag/py-0.20.17