Closed Nicoretti closed 6 months ago
To answer some of your questions:
What features does the Ibis Project require for the MVP for a new backend?
We don't really have a documented bare minimum feature set for new backends.
If I had to say what it was, I think it'd be:
con.table(name)
returns a table expression without failingcon.table(name).execute()
returns a pandas DataFrame without failingBasically, you should be able to run the equivalent of SELECT * FROM t
and bring the results back into Python as a pandas DataFrame.
How do I update/work with the Nix shell when dependencies, etc. are added?
We don't have specific documentation for this. I can add some.
I suspect you need to run just lock
, and then direnv reload
(if you're using direnv
) or exit and re-enter the nix-shell
.
Regarding getting the tests to pass, the least painful strategy is to insert
pytestmark = pytest.mark.notimpl(["exasol"])
at the top of every test_*.py
file (or insert "exasol"
into the existing pytestmark
's list) and then start implementing functionality.
When a test that is marked as notimpl
passes, you'll see a pytest failure with a message like this:
[XPASS(strict)] ...
Indicating that you've successfully passed the test. You can then remove "exasol"
from the notimpl
list.
Hi @cpcloud,
thanks for all the input and help, I'll pickup on it tomorrow.
best Nico
@cpcloud I assume the final PR will be squash merged
and therefore did not follow the projects commit message conventions.
If that assumption is wrong, please let me know.
@Nicoretti We can clean up the commits once everything is green :) No need to worry about it for now.
Hi @cpcloud
Just did another catchup with your master
. I am wondering if the following workflow
for addressing merge conflicts in: poetry.lock
, requirements-dev.txt
, poetry-overrides.nix
is correct:
pyproject.toml
is correct (correctly merged)poetry.lock
poetry lock
or poetry update
just lock
@Nicoretti That looks right!
The only thing I would change is that you should be able to merge steps 3 and 4 into
just lock
@cpcloud sry had to force push. I had rebased the changes from the ibis/master
into this branch and pushed it :sweat_smile:.
btw doing a "clean" lock, does not seem to work properly (even on ibis/master
). Poetry does not finish resolving (conflicts).
A relock, with an existing poetry.lock
is working fine though.
rm poetry.lock
just lock
@Nicoretti I am going to rebase this PR and fix up the merge conflicts!
@cpcloud thx for updating
I can help with the poetry.lock
conflicts!
@cpcloud
I can help with the
poetry.lock
conflicts. Thanks appreciate it, next time I'll give it another try ;).
Side Note(s):
best Nico
Hi @cpcloud,
I am back on the PR. Sorry for the extra delay; I had to address a bug and some other issues in other projects I am maintaining.
Since my catch-up yesterday, I have experienced two issues. I'm just wondering if you may have any idea regarding them.
The nix-shell
does not seem to work the way described here anymore. This is probably due to PR #7437. Is this intended?
Executing tests seems pretty flaky (not only in parallel mode):
pytest -n 0 --dist no -m exasol
749 failed, 233 passed, 34 skipped, 27246 deselected, 240 xfailed, 24 errors in 350.39s (0:05:50)
561 failed, 434 passed, 34 skipped, 27246 deselected, 242 xfailed, 9 errors in 442.03s (0:07:22)
618 failed, 377 passed, 34 skipped, 27246 deselected, 242 xfailed, 9 errors in 465.57s (0:07:45)
pytest -n 0 --dist loadgroup -m exasol
582 failed, 413 passed, 34 skipped, 27246 deselected, 242 xfailed, 9 errors in 469.88s (0:07:49)
320 failed, 227 passed, 34 skipped, 27246 deselected, 153 xfailed, 546 errors in 323.05s (0:05:23)
626 failed, 367 passed, 34 skipped, 27246 deselected, 242 xfailed, 11 errors in 435.29s (0:07:15)
pytest -n auto -m exasol
512 failed, 483 passed, 34 skipped, 242 xfailed, 9 errors in 323.58s (0:05:23)
518 failed, 478 passed, 34 skipped, 241 xfailed, 9 errors in 279.37s (0:04:39)
443 failed, 415 passed, 34 skipped, 200 xfailed, 188 errors in 286.54s (0:04:46)
just test exasol
533 failed, 462 passed, 34 skipped, 242 xfailed, 9 errors in 253.28s (0:04:13)
518 failed, 478 passed, 34 skipped, 241 xfailed, 9 errors in 273.90s (0:04:33)
545 failed, 448 passed, 34 skipped, 243 xfailed, 10 errors in 268.66s (0:04:28)
I was wondering if this rings a bell or sounds like a problem you've already encountered in the past with other backends.
Thanks for being so supportive all along.
Best, Nico
Hey @Nicoretti -- @cpcloud is on a well-deserved vacation but will be back next week.
You are right that those nix
instructions are out-of-date. You have two options if you want to use nix
(and I'll update the docs):
nix develop
which will drop you into a shell that has all of the packages loaded updirenv
(here: https://direnv.net/docs/installation.html) and it will auto-load things into your current shell environment when you enter the ibis
directory (@cpcloud and I both use direnv
)Regarding test flakiness, that could be a bunch of things -- some of the failure modes might help diagnose. If I had to guess, some tests are leaving some global state around, and since we use pytest-randomly
the tests will always run in a different order, so you might get "lucky" and have tests pass accidentally some of the time.
One thing to try initially is to add --randomly-dont-reorganize
to your pytest
call and see if the number of failures stays constant (at least in a serial run).
Hi @gforsyth,
Thanks for the timely update and feedback, appreciate it. The random test execution is a very good point, I didn't really had it on my radar yet.
Best, Nico
I am fixing the merge conflicts!
@Nicoretti Is there any chance sqlalchemy-exasol
can be made compatible with the latest sqlalchemy release that is <2 (1.4.50)?
Python 3.12 support is coming up and the sqlalchemy version that's supported by sqlalchemy-exasol
is now constraining us to a version that doesn't provide binary wheels for python 3.12.
Generally speaking it should be fine to set sqlalchemy constraints to allow patch version bumps, they tend not to break things in patch releases, so instead of the <1.4.45
bound that's in sqlalchemy-exasol
it would be fine to make that <1.5
if you're not comfortable making it <2
.
I am running the test suite now and will help grind through the current failures. I'd like to avoid long running PRs if possible π
Generally speaking it should be fine to set sqlalchemy constraints to allow patch version bumps, they tend not to break things in patch releases, so instead of the
<1.4.45
bound that's insqlalchemy-exasol
it would be fine to make that<1.5
if you're not comfortable making it<2
.
@cpcloud Generally, yes. I need to re-validate it against the test suite of that sqlalchemy
version though. I'll take care of it this week. I hope that fits your schedule.
@Nicoretti It looks like your approach so far is to go through every test file and implement functionality until tests pass.
I highly recommend a different approach that will get this merged upstream much faster:
Mark everything that is currently failing as pytest.mark.notimpl
and create follow-up pull requests to add functionality.
This also has the side benefit of user-facing release notes that show when particular functionality was added.
Generally speaking it should be fine to set sqlalchemy constraints to allow patch version bumps, they tend not to break things in patch releases, so instead of the
<1.4.45
bound that's insqlalchemy-exasol
it would be fine to make that<1.5
if you're not comfortable making it<2
.@cpcloud Generally, yes. I need to re-validate it against the test suite of that
sqlalchemy
version though. I'll take care of it this week. I hope that fits your schedule.
Sounds good to me!
I am running the test suite now and will help grind through the current failures. I'd like to avoid long running PRs if possible π
Thanks, no worries. I've also been in and out with the PR at times. So, don't worry, working through it from my side has been fine so far. Some of the recent test issues stemmed from execution order and implicit schema changes in the Exasol DB, which took me a bit to figure out.
I hope our interaction has been smooth for you too (don't want to unnecessarily bother you with irrelevant questions, etc.).
Best regards, Nico
@Nicoretti It looks like your approach so far is to go through every test file and implement functionality until tests pass.
I highly recommend a different approach that will get this merged upstream much faster:
Mark everything that is currently failing as
pytest.mark.notimpl
and create follow-up pull requests to add functionality.This also has the side benefit of user-facing release notes that show when particular functionality was added.
100% Agreed. (wasn't a 100% sure if that's good enough from ibis side).
Yeah, we tend to prefer that approach since it can get in the hands of users more quickly and possibly help prioritize particular functionality (time series, strings, etc)
Yeah, we tend to prefer that approach since it can get in the hands of users more quickly and possibly help prioritize particular functionality (time series, strings, etc)
Yeah, we tend to prefer that approach since it can get in the hands of users more quickly and possibly help prioritize particular functionality (time series, strings, etc)
Great :+1:, to align an make sure we are on the same page, here is current plan of action: (in that order)
sqla
constrainSolid plan, I like it!
I am working through adding xfail
markers to test_temporal.py
right now, let's try avoid any duplicate work there :)
Ok, test_temporal.py
is done, I am now working on test_aggregation.py
@cpcloud I started upgrading (loosening the sqla constraint) for sqlalchemy-exasol
. There are a few (hopefully minor) issues which need to be looked at. Once they are addressed, I'll create a release and relock
this PR.
I have updated and addressed most things in the PR and will wait for feedback.
As recently discussed with @cpcloud, I focused efforts to get the basic Exasol dialect PR done by disabling (xfail
) missing features (which will be provided via follow-up PRs). In order to do this, I went through the tests and marked failing tests with xfail
. For some (12
), I needed to use xfail(..., strict=False)
because they were a result of parameterize
combinations. As far as I know, this could only be resolved by unrolling the parameterize
combinations. Even though I personally feel/think that unrolling would be the preferable solution, I fear it will also increase the likelihood of tedious conflicts significantly, due to the fact that the entire parameter setup needs to be adjusted. So, in favor of a timely and "straightforward" merge, I opted for some xfail(..., strict=False)
. If one or all of the usages do not match with your take on how this should be handled, please let me know.
Exasol window latest
"Linux" cannot be used on this platform:...
(The Exasol DB only runs on Linux)
I'll be happy to push the PR forward, but from my current point of view, I would wait for feedback before I continue to also minimize the problem of a moving target.
Still, take your time; there is no need to rush. I just wanted to let you know what my current status regarding the PR is so we can align and avoid unnecessary work, downtimes, etc.
Hope that fits with you.
Best, Nico
Taking a look now! This looks pretty close.
@Nicoretti I went through and had to make a few changes
gen_name
is bit annoying, but I understand it. I fixed an issue with oracle quoting of non-underscore-prefixed namesClouds look good:
cloud in π falcon in β¦/ibis on ξ exasol-backend is π¦ v7.1.0 via π v3.10.13 via βοΈ impure (ibis-3.10.13-env) took 4s
β― pytest -m 'bigquery or snowflake' -n 8 --dist loadgroup --snapshot-update
=================================================================================================== test session starts ===================================================================================================
platform linux -- Python 3.10.13, pytest-7.4.3, pluggy-1.3.0
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
Using --randomly-seed=1901397676
rootdir: /home/cloud/ibis
configfile: pyproject.toml
plugins: benchmark-4.0.0, clarity-1.0.1, mock-3.12.0, randomly-3.15.0, cov-4.1.0, repeat-0.9.3, hypothesis-6.91.0, httpserver-1.0.8, xdist-3.5.0, snapshot-0.9.0
8 workers [3061 items]
.........xxxxx.....x....x.x...x.x..x.x.....x..x..s.x..x...x.....x..xxx.x..x..x....xx.x.x..x.........x.........x.x...x............................xx.x...x...x.....x...x.xx..xxx............x..xxxxx.xxx.xx.xxx.x..x [ 6%]
xx...xxx.x.xx..x.........x.........x............xx.....x.x..................x.....x....x.....x.....................x........................x....x....x....x.....x.....x........x..x..........xx....x...xx...x.x.xx [ 13%]
..................x...............x.............x..x............xx....x..x.x.x.......x.............x....x......................................................xx..........xx...............x.x.x.x.xx..x.......x.. [ 20%]
....x.....x.................x...x.x....x...........x....x..xx.x.x.......x.x.....x....x......xx......xssssssssssssssssssssssx...x....x..................................x.sxxx...xx.xxx..xx.xx.x..x....x..xx.x.x.... [ 27%]
......x......x.x.xx...x.x....xx.......xx..x.xx.Exception ignored in: <socket.socket fd=17, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('10.0.0.2', 54476), raddr=('34.102.189.10', 443)>
Traceback (most recent call last):
File "/nix/store/nmx1iczmhb82djilf3ryprb4rf6iqqnw-python3-3.10.13-env/lib/python3.10/site-packages/snowflake/connector/vendored/urllib3/poolmanager.py", line 223, in clear
self.pools.clear()
ResourceWarning: unclosed <socket.socket fd=17, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('10.0.0.2', 54476), raddr=('34.102.189.10', 443)>
..x.x.....x......x.x......xxx..xx....xx.xxxx....x...x....xxxx...............x..........x.......x.....x.........x..x...xx..........x................................. [ 34%]
...................................................x.....xx.x..........x......x.....................x.x.x.........x...x.....................x.x......................x.........x....x.........x................x... [ 41%]
......x............x.............x......................x............s......x........x........x....................x.......xxx...x....x.xx..........x.........................x.xxx.....xxxxx..x.xxxxxxxx.xx..x.xxx [ 48%]
x.xx.xxxxxxxxxxxxxxxxxx.x...xx.x....x.........x.....x.xx.x..x.x....x......x.............x..x.x.....x.xx...................x..x.....x......x.................x...x.x..xxx.x.....x.....x............................. [ 55%]
.....................................x....x.............................................................x............x.......x...x.x..x.....x.............xx.x...x...x......x......x..xx..x..xxx.........x......x.x [ 62%]
.....x.....................x..x..x.x....x.x..............x.x...............s.x.sx..x......s............x.........x.........x.......x............x...x...x...x..x...x.....x..x..x.xx.................x.....x......x. [ 68%]
.......x.....x.....x...x...x.....x...x........x.x...x........................x......................x.....................................................x..x..xx...........x...........xx.x....x................. [ 75%]
x........x.......x..xx....xx............x.x.....x.x..x...x...x..x.....xx....x...x.x...x.......x..x...x.................x.x......xx......................................x...........x..........................x... [ 82%]
.............................x............................x.........xx.x.........xx..x....x.xxxxx.x..................................................................x.......x................................x..... [ 89%]
...................................................x.....s.................x....................................................................................................................................... [ 96%]
........................................................................................s................. [100%]
================================================================================ 2582 passed, 30 skipped, 449 xfailed in 791.30s (0:13:11) ================================================================================
Hi @cpcloud,
thanks for the review, adjustments and help :heart: . Addressed or responded to the open points.
best Nico
Clouds are passing:
β¦/ibis on ξ exasol-backend is π¦ v7.1.0 via π v3.10.13 via βοΈ impure (ibis-3.10.13-env)
β― pytest -m 'bigquery or snowflake' -n auto --dist loadgroup --snapshot-update -q
bringing up nodes...
.........x...x.............x....x...x.......xx.....x...................x.........x.x..........x..x.......x.x.x..x............x.......x..xx..x.x.xxxx.x.x.xxxxx.......xxx..x.x....x..xxx......x... [ 6%]
x..x..x.......x.x....x......x.......x......x........x.....x..x.............x..xx..x..xx....x.....x.xxx.x....x..........x.......x....xx.x.xx...x....xx......xx.....x......x...x...x..xxx...x...... [ 12%]
..xx.xx.......x.x.........x........x.................s.......x....x.x....xx..x...x.xxxx........x......x.xx........x......................xx.............................x..x.....s...x........... [ 18%]
.........................x.................................x..x.x......x.xx..x.x..x..x....x...x....xx........x..xx.xx..x......x...x..x.....x.............x.......x.....xx.........xx......x....x. [ 25%]
.......x....x........xx...x..x......x.x...x....x...........x.x..x.x..xxx.....x....x...x.xx....x.x.x.x..........x...x..............x.............x..x..x.....x.x........xx.xx..........x..x....... [ 31%]
..x.x.x.....x...xx.xx...x......x..xx..x......................................................x..............x......x........x..xx.x.......x....x....x...x.......x.......x.........xx..x.....x.x.x [ 37%]
........................................x....................xx..........x..x...................x....................xx..ssssssssssssssssssssss..xx..xx.....x............x..x..x...x........x.... [ 44%]
.....xx......xx.....x...........xx...xx...xxx..........x..x...........xx...x.xx.x...xxxx.x.xxx..x.x...x.xx..........xxx...xxxxxxxxxxxxxxxxxx.xxx.xxx..x..xx...x...x..x...x..x...x..x........x.... [ 50%]
.....x..........xxx.x.x.............x.x....xx...............x...........xx.......x....x...x........x.......xx............x..x....x..........xx.........x..xx.x.s.x..............x....sx.......... [ 56%]
.x..........s..x........xx.....x....................x..s...............x.........................x..............xx...x...x.....................x..x..x....x..x.........x.....xx....x............. [ 63%]
.x........xxx....x.x.................xx..x.....x......x....x.....x..x.......x...x......x....xx....x.xx...............x......x...x.....xx.x.....xx.xx..x....x.........x.....x.xx...x.x....x.x..x.. [ 69%]
..x................x...x......xx....x..x....x............x...................................................................x................................x.x..x.x..........x..............x. [ 75%]
.........x...............................x........x...x.....................................................................................................x............x.......x............... [ 81%]
...............................................................x........................x.............................................................................x.....x.x.xx.......x.x..xx. [ 88%]
x................x......x......x................................................................................................................................................................. [ 94%]
........................................xs............................x................................x.................s............x............................... [100%]
2582 passed, 30 skipped, 449 xfailed in 459.64s (0:07:39)
@gforsyth
Are there any hints as to the cause of the flakiness?
One issues which caused failing tests in serial and parallel execution is fixed and noted in
the create_schema
function (see snippet below).
...
# Exasol implicitly opens the created schema, therefore we need to restore
# the previous context.
action = (
sa.text(f"OPEN SCHEMA {open_schema}")
if open_schema
else sa.text(f"CLOSE SCHEMA {name}")
)
con.exec_driver_sql(action)
As of today, when executed serially, the tests should not be flaky. However, concerning parallel execution, there still seem to be one or more issues that need identification and addressing.
Having rebased against, the latest master a backend test (flink
) is failing during install.
I have seen similar things in the past, it potentionaly have been an network
issues during the pip install
.
Just rebased, oracle
keeps failing, not entirely sure why this is (how it relates to my changes).
Any ideas?
Bringing in the upstream changes from
broke the oracle tests.
I had to change the type inference related code for oracle in PR due to the change to gen_name
. In the meantime, there was a change to this bit of code upstream to fix a heisenbug.
I'll fix it back up.
It looks like you also reverted https://github.com/ibis-project/ibis/compare/90aee1a05296526ee1a034327699e5bea0a3cb8b..fc7c9a6e46a2bc769df583d0f044c22146daee9b#diff-70e8f790a7cd8392a8b812c47a95be84f3835051e37a873536cefb3cb726d360L97
which is an important change to ensure that we don't break BigQuery due to Exasol's inability to create a table whose name starts with _
.
I'll add this back too.
Bringing in the upstream changes from
broke the oracle tests.
I had to change the type inference related code for oracle in PR due to the change to
gen_name
. In the meantime, there was a change to this bit of code upstream to fix a heisenbug.I'll fix it back up.
Thanks for your help! Heisen bugs ... had my fair share of those :sweat_smile:.
It looks like you also reverted https://github.com/ibis-project/ibis/compare/90aee1a05296526ee1a034327699e5bea0a3cb8b..fc7c9a6e46a2bc769df583d0f044c22146daee9b#diff-70e8f790a7cd8392a8b812c47a95be84f3835051e37a873536cefb3cb726d360L97
which is an important change to ensure that we don't break BigQuery due to Exasol's inability to create a table whose name starts with
_
.I'll add this back too.
Thanks, my bad. I assumed that I have accidentally overwritten changes from the upstream (master)
and therefore reverted some changes of mine to the latest master
.
What's going on with the coverage warnings in exasol/__init__.py
? Are we failing to upload something? Because there's no way the test suite is running and we aren't hitting _get_sqla_table
What's going on with the coverage warnings in
exasol/__init__.py
? Are we failing to upload something? Because there's no way the test suite is running and we aren't hitting_get_sqla_table
@gforsyth
Good question, I just validated _get_sqla_table
(ibis/backends/exasol/__init__.py#L147-L148)
gets hit repeatedly (when the suit is executed locally). Not sure on the sonar cloud part, I don't have to much experience with it yet.
@gforsyth Good question, I just validated
_get_sqla_table
(ibis/backends/exasol/__init__.py#L147-L148)
gets hit repeatedly (when the suit is executed locally). Not sure on the sonar cloud part, I don't have to much experience with it yet.
Looks like @cpcloud fixed it!
@gforsyth This LGTM! Anything else left to do?
Bombs away! Thanks for your work @Nicoretti, look forward to working with you more on this!
Bombs away! Thanks for your work @Nicoretti, look forward to working with you more on this!
Looking forward to keep working with you too!
:wave: Hi Ibis Team,
I reached out to you a few months ago (here), but I wasn't able to follow up on this in a more timely manner.
This PR contains an initial draft of adding an
Exasol
backend toIbis
. I am aware that the PR is not very polished yet and also lacks a significant number of features. That said, with the PR, I also wanted to establish a means of communication to get some feedback and align with you while pushing the feature of an Exasol backend forward.Status Quo
just up exasol
just test exasol
715 failed, 430 passed, 38 skipped, 26312 deselected, 38 xfailed, 33 errors
Questions
Questions for/to the Ibis Project
nix-shell
failed after introducing new dependencies.Nix
:sweat_smile:)?Other Questions
To-Do's
Edited:
It would be great if you could give me some initial feedback on the PR and have a look at my questions. I appreciate your effort and look forward to hearing from you.
best Nico