Closed NickCrews closed 4 months ago
@jcrist I wrote this when I thought the intended behavior was like .title()
. But I think you are right, let me change these test cases.
Wow, after the sqlglot refactor this is SO much cleaner. Love it, thanks so much for your work on that big rewrite!
This might deserve a callout fairly prominently in the release notes, because although it might technically be a fix, that is only due to a poor definition of what capitalize was supposed to do. I bet some users' results are going to change as a result of this. There was even a test in the postgres backend testing for the old semantics that I had to deleted.
It appears that exasol treats ||
as equivalent to concat()
, which is not ANSI standard SQL when you have NULL || 'abc'
: You should get NULL, but exasol gives 'abc'. We already have a workaround for this in ops.StringConcat compiler for the exasol backend. Maybe we should redo this implementation to not compile directly with the DPipe, but instead rewrite the entire op.Capitalize in terms of op.StringConcat, op.Upper, etc ?
I filed a support ticket with exasol and asked them to chime in here. Curious if they will fix their implementation, or if we will need to have a custom workaround forever.
@jcrist @cpcloud OK I'm a little stumped as to the reason for the failing tests, since they should be using StringConcat and Substring, which both look tested on these backends. I can't run any of these backends locally since I don't have the creds. Do you think either of you could finish this up?
also, it appears that many backends have that "bug" that exasol does, so maybe it's not a bug.
I can't run any of these backends locally since I don't have the creds.
Except for BigQuery and Snowflake, where there are real credentials that are necessarily secret, you can find the credentials for all the backends in their respective conftest.py
files.
I'll often run the test suite with a pytest -m exasol
and then kill it after the first test passes, so that the test startup has a chance to load the test data into the backend container.
I try just up
and see
$ just up
docker compose up --wait
[+] Building 0.0s (0/0) docker:desktop-linux
[+] Running 17/18
✔ Container ibis-flink-jobmanager-1 Created 0.0s
✘ Container ibis-minio-1 Error 0.0s
✔ Container ibis-statestored-1 Started 0.0s
✔ Container impala-hive-metastore Created 0.0s
✔ Container ibis-mysql-1 Created 0.0s
✔ Container druid-postgres Created 0.0s
✔ Container ibis-kudu-1 Started 0.0s
✔ Container ibis-postgres-1 Running 0.0s
✔ Container ibis-mssql-1 Created 0.0s
✔ Container ibis-hive-metastore-db-1 Created 0.0s
✔ Container zookeeper Created 0.0s
✔ Container ibis-exasol-1 Created 0.0s
✔ Container ibis-clickhouse-1 Running 0.0s
✔ Container ibis-flink-1 Created 0.0s
✔ Container ibis-hive-metastore-1 Created 0.0s
✔ Container coordinator Created 0.0s
✔ Container ibis-trino-1 Created 0.0s
⠦ Container ibis-oracle-1 Starting 0.5s
Error response from daemon: mkdir /var/lib/docker/overlay2/9e12c9117dc506231210816af71282562ba269c4bd946ce870c4ad65316c75cc/merged: no space left on device
error: Recipe `up` failed on line 101 with exit code 1
so I am ignoring the minio error (maybe a separate issue?), since exasol looks to be working, and then try pytest -m exasol -k capitalize ibis/backends/tests/test_string.py
and then get a bunch of
__________________________________________________ ERROR at setup of test_capitalize[exasol-aBc1dEf-Abc1def] __________________________________________________
self = <ExaConnection session_id= dsn=localhost:8563 user=sys>
def _init_ws(self):
"""
Init websocket connection
Connection redundancy is supported
Specific Exasol node is randomly selected for every connection attempt
"""
dsn_items = self._process_dsn(self.options['dsn'])
failed_attempts = 0
ws_prefix = 'wss://' if self.options['encryption'] else 'ws://'
ws_options = self._get_ws_options()
for hostname, ipaddr, port, fingerprint in dsn_items:
self.logger.debug(f"Connection attempt [{ipaddr}:{port}]")
# Use correct hostname matching IP address for each connection attempt
if self.options['encryption']:
ws_options['sslopt']['server_hostname'] = hostname
try:
> self._ws = websocket.create_connection(f'{ws_prefix}{ipaddr}:{port}', **ws_options)
.venv/lib/python3.11/site-packages/pyexasol/connection.py:667:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
.venv/lib/python3.11/site-packages/websocket/_core.py:646: in create_connection
websock.connect(url, **options)
.venv/lib/python3.11/site-packages/websocket/_core.py:256: in connect
self.sock, addrs = connect(
.venv/lib/python3.11/site-packages/websocket/_http.py:141: in connect
sock = _open_socket(addrinfo_list, options.sockopt, options.timeout)
.venv/lib/python3.11/site-packages/websocket/_http.py:228: in _open_socket
raise err
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
addrinfo_list = [(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('127.0.0.1', 8563))], sockopt = [], timeout = 10
def _open_socket(addrinfo_list, sockopt, timeout):
err = None
for addrinfo in addrinfo_list:
family, socktype, proto = addrinfo[:3]
sock = socket.socket(family, socktype, proto)
sock.settimeout(timeout)
for opts in DEFAULT_SOCKET_OPTION:
sock.setsockopt(*opts)
for opts in sockopt:
sock.setsockopt(*opts)
address = addrinfo[4]
err = None
while not err:
try:
> sock.connect(address)
E ConnectionRefusedError: [Errno 61] Connection refused
.venv/lib/python3.11/site-packages/websocket/_http.py:205: ConnectionRefusedError
During handling of the above exception, another exception occurred:
request = <SubRequest 'backend' for <Function test_capitalize[exasol-Abc-Abc]>>, data_dir = PosixPath('/Users/nc/code/ibis/ci/ibis-testing-data')
tmp_path_factory = TempPathFactory(_given_basetemp=None, _trace=<pluggy._tracing.TagTracerSub object at 0x152821510>, _basetemp=PosixPath...var/folders/0b/tpgpsvx54f9g1gg1lkq45syc0000gp/T/pytest-of-nc/pytest-164'), _retention_count=3, _retention_policy='all')
worker_id = 'master'
@pytest.fixture(params=_get_backends_to_test(), scope="session")
def backend(request, data_dir, tmp_path_factory, worker_id) -> BackendTest:
"""Return an instance of BackendTest, loaded with data."""
cls = _get_backend_conf(request.param)
> return cls.load_data(data_dir, tmp_path_factory, worker_id)
ibis/backends/conftest.py:434:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
ibis/backends/tests/base.py:158: in load_data
inst = cls(data_dir=data_dir, tmpdir=tmpdir, worker_id=worker_id, **kw)
ibis/backends/tests/base.py:100: in __init__
self.connection = self.connect(tmpdir=tmpdir, worker_id=worker_id, **kw)
ibis/backends/exasol/tests/conftest.py:49: in connect
return ibis.exasol.connect(
ibis/__init__.py:107: in connect
return backend.connect(*args, **kwargs)
ibis/backends/base/__init__.py:864: in connect
new_backend.reconnect()
ibis/backends/base/__init__.py:874: in reconnect
self.do_connect(*self._con_args, **self._con_kwargs)
ibis/backends/exasol/__init__.py:87: in do_connect
self.con = pyexasol.connect(
.venv/lib/python3.11/site-packages/pyexasol/__init__.py:68: in connect
return ExaConnection(**kwargs)
.venv/lib/python3.11/site-packages/pyexasol/connection.py:184: in __init__
self._init_ws()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <ExaConnection session_id= dsn=localhost:8563 user=sys>
def _init_ws(self):
"""
Init websocket connection
Connection redundancy is supported
Specific Exasol node is randomly selected for every connection attempt
"""
dsn_items = self._process_dsn(self.options['dsn'])
failed_attempts = 0
ws_prefix = 'wss://' if self.options['encryption'] else 'ws://'
ws_options = self._get_ws_options()
for hostname, ipaddr, port, fingerprint in dsn_items:
self.logger.debug(f"Connection attempt [{ipaddr}:{port}]")
# Use correct hostname matching IP address for each connection attempt
if self.options['encryption']:
ws_options['sslopt']['server_hostname'] = hostname
try:
self._ws = websocket.create_connection(f'{ws_prefix}{ipaddr}:{port}', **ws_options)
except Exception as e:
self.logger.debug(f'Failed to connect [{ipaddr}:{port}]: {e}')
failed_attempts += 1
if failed_attempts == len(dsn_items):
> raise ExaConnectionFailedError(self, 'Could not connect to Exasol: ' + str(e))
E pyexasol.exceptions.ExaConnectionFailedError:
E (
E message => Could not connect to Exasol: [Errno 61] Connection refused
E dsn => localhost:8563
E user => sys
E schema =>
E session_id =>
E )
.venv/lib/python3.11/site-packages/pyexasol/connection.py:674: ExaConnectionFailedError
I am on an M1 mac, is that an issue? Am I doing something else wrong?
You won't be able to run every service on macos. Some of these services use Linux-specific APIs.
Am I doing something ... wrong?
I am on an M1 mac
So yeah, as Phillip mentioned, not all of these work out of the box on mac. I believe @ncclementi has at least some of them working using colima
and is planning to write up some instructions on how to do it (not sure if that will cover exasol or not, but maybe?)
OK, thanks. Do you think either of you could finish this PR up then in the meantime?
I got the postgress one up and running, and I was able to run tests. But I was trying to see if I could get up the exasol container to run the tests but I didn't make it pass. I couldn't get the container up and healthy.
I tried also using colima's virtualization but I get
container ibis-exasol-1 is unhealthy
error: Recipe `up` failed on line 101 with exit code 1
When I check the logs I get
[2024-02-14 20:54:39] child 757 (membership) returned with state 1.
Wed Feb 14 20:54:42 CET 2024: start script '/usr/opt/EXASuite-7/EXAClusterOS-7.1.25/sbin/cos_sync_files'.
Wed Feb 14 20:54:42 CET 2024: pack files for sync.
exa/etc/dwad/db_DB1_ca.pem
exa/etc/dwad/db_DB1_cert.pem
exa/etc/dwad/db_DB1_key.pem
No nodes to synchronize.
Wed Feb 14 20:54:46 CET 2024: start script '/usr/opt/EXASuite-7/EXAClusterOS-7.1.25/sbin/cos_sync_files'.
Wed Feb 14 20:54:46 CET 2024: pack files for sync.
exa/etc/dwad/db_DB1.conf
No nodes to synchronize.
Traceback (most recent call last):
File "/usr/opt/EXASuite-7/EXAClusterOS-7.1.25/libexec/exainit.py", line 132, in <module>
main(sys.argv)
File "/usr/opt/EXASuite-7/EXAClusterOS-7.1.25/libexec/exainit.py", line 124, in main
run(args[1])
File "/usr/opt/EXASuite-7/EXAClusterOS-7.1.25/libexec/exainit.py", line 92, in run
serv.run_result = serv.run()
File "/usr/opt/EXASuite-7/EXAClusterOS-7.1.25/lib/exainit/tasks/database.py", line 135, in run
self._db_setup_and_start (db, dbname, dbpars, True)
File "/usr/opt/EXASuite-7/EXAClusterOS-7.1.25/lib/exainit/tasks/database.py", line 117, in _db_setup_and_start
self.start(db)
File "/usr/opt/EXASuite-7/EXAClusterOS-7.1.25/lib/exainit/tasks/database.py", line 84, in <lambda>
return lambda db, *a, **kw: self._wrappercall(getattr(db, name), *a, **kw)
File "/usr/opt/EXASuite-7/EXAClusterOS-7.1.25/lib/exainit/tasks/database.py", line 90, in _wrappercall
raise RuntimeError("Could not call function %s(*%s, **%s): %s,%s" % (repr(func), repr(args), repr(kw), repr(ret), repr(val)))
RuntimeError: Could not call function <built-in method start of libDWAd_py3.exa_db object at 0x7fe863c5a830>(*(), **{}): 1,'Could not start database: system does not have enough active nodes or DWAd was not able to create startup parameters for system'
[2024-02-14 20:54:48] root child 235 (exainit.py) returned with state 1.
any idea what is wrong?
Maybe this is because exasol only supports linux, and their image says https://github.com/EXASOL/docker-db?tab=readme-ov-file#host-os
hmm, it seems to me that MacOS is the most likely cause of exasol not working, and doesn't look like there are easy workarounds.
Yes, it seems people have been asking for a while for support of the exasol image for M1 but no answers. see https://github.com/exasol/docker-db/issues/63
I bypass the error above providing more memory but the container is always "unhealthy", I hit this same error https://github.com/exasol/docker-db/issues/77 (it's been open for a while) and not sure what's the reason behind.
The amount of additional code generated here doesn't seem worth the trouble.
For Oracle we end up generating this monstrosity to capitalize the empty string:
SELECT
CASE
WHEN LENGTH('') = 0
THEN ''
ELSE CASE
WHEN UPPER(
CASE
WHEN (
0 + 1
) >= 1
THEN SUBSTR('', 0 + 1, 1)
ELSE SUBSTR('', 0 + 1 + LENGTH(''), 1)
END
) IS NULL
OR LOWER(
CASE
WHEN (
1 + 1
) >= 1
THEN SUBSTR('', 1 + 1, CASE WHEN (
LENGTH('') - 1
) < 0 THEN 0 ELSE LENGTH('') - 1 END)
ELSE SUBSTR(
'',
1 + 1 + LENGTH(''),
CASE WHEN (
LENGTH('') - 1
) < 0 THEN 0 ELSE LENGTH('') - 1 END
)
END
) IS NULL
THEN NULL
ELSE CONCAT(
UPPER(
CASE
WHEN (
0 + 1
) >= 1
THEN SUBSTR('', 0 + 1, 1)
ELSE SUBSTR('', 0 + 1 + LENGTH(''), 1)
END
),
LOWER(
CASE
WHEN (
1 + 1
) >= 1
THEN SUBSTR('', 1 + 1, CASE WHEN (
LENGTH('') - 1
) < 0 THEN 0 ELSE LENGTH('') - 1 END)
ELSE SUBSTR(
'',
1 + 1 + LENGTH(''),
CASE WHEN (
LENGTH('') - 1
) < 0 THEN 0 ELSE LENGTH('') - 1 END
)
END
)
)
END
END AS "Capitalize('')"
And it's still not correct.
Alright, it turns out the mishandling of the empty string is reproducible in the oracledb
driver alone. I opened https://github.com/oracle/python-oracledb/issues/298 for that.
Gosh, what a hellscape this problem is.
You're a hero @cpcloud ! Thank you!
Fixes #8271.
BREAKING CHANGE: Backends that previouslly used
initcap
(analogous tostr.title
) to implementStringValue.capitalize()
will produce different results when the input string contains multiple words (a word's definition being backend-specific).