Closed gordthompson closed 2 years ago
cc: @keitherskine, @hugovk @abitrolly
Unfortunately, I cannot test the Mac wheels.
FWIW, if I cd
to the root folder of my local pyodbc repo (venv) and run
python -m pip wheel .
it produces a file named pyodbc-4.0.dev0-cp38-cp38-linux_x86_64.whl
that does not appear to have the two lib*.so files embedded in it. I can use that .whl file to install pyodbc on a clean Xubuntu 20.04 VM with no issues (once I sudo apt install unixodbc
so the required .so files are in their usual place).
@gordthompson I believe you need https://pypi.org/project/auditwheel/ to include *.so
files into .whl
.
Unfortunately, they include rather old versions of libodbc.so and libltdl.so that override the system versions
Wheels are built using manylinux_2_24
image with is based on Debian 9. Relevant lines are here.
These old versions should be native Debian packages. In that case either build libodbc.so
from source, or switch to manylinux_2_28
which is based on AlmaLinux 8 and may be more up to date.
Unfortunately, I cannot test the Mac wheels.
I don't have Mac, so no help for it from me either.
Hi @abitrolly . Thanks for the pointers.
I've just been reading up on this and, yes, it appears that auditwheel (specifically auditwheel repair
) is what's adding the libs to the wheel files. From
https://pypi.org/project/auditwheel/
"auditwheel repair
: copies these external shared libraries into the wheel itself, and automatically modifies the appropriate RPATH entries such that these libraries will be picked up at runtime."
So that's what is happening now. The problem is that the ODBC DM (Driver Manager, unixODBC in this case) is a "system thing" and, IMO, individual apps should not be overriding the system libs for something like that. Not only might that lead to strange errors (as we have seen), but it could also be a support headache: odbcinst -j
might indicate one unixODBC version while pyodbc was using some other version of libodbc.so.2.
Omitting the libs from the wheel files would require that users install unixODBC separately (which, BTW, would happen automatically if they installed "ODBC Driver 17/18 for SQL Server") but at least they wouldn't have to install build tools (e.g. sudo apt install build-essential
) and unixODBC header files (e.g., sudo apt install unixodbc-dev
) to compile pyodbc from source.
https://cibuildwheel.readthedocs.io/en/stable/ should contain info how to avoid copying libs. There is probably a good reason why wheels are copying tested version of libs instead of expecting that system ones are available.
So https://pypi.org/project/auditwheel/ also says …
"auditwheel is a command line tool to facilitate the creation of Python wheel packages for Linux (containing pre-compiled binary extensions) that are compatible with a wide variety of Linux distributions, consistent with the PEP 600 manylinux_x_y, PEP 513 manylinux1, PEP 571 manylinux2010 and PEP 599 manylinux2014 platform tags."
(emphasis mine)
… which sort of makes me wonder if we should be generating manylinux wheels at all. If we build "generic" Linux wheels as described above (that require a separate unixODBC install) and they don't work for a particular distro then the workaround for those users would be
pip install pyodbc --no-binary pyodbc
to build from source, as with previous versions.
It seems to me that the right way is compile against latest version of libobdc.so
. I would also ask here https://github.com/mkleehammer/pyodbc/discussions
It doesn't matter which version of unixODBC you compile with as long as it is >= 2.3.1 since they share the same ABI. The problem is with redistributing it, which shouldn't be done because on different Linux distros the DM may be configured slightly differently.
I don't understand why configuration depends on binary if ABI is the same.
This is my understanding of the situation with the wheels in the current (4.0.34) distribution. Release 4.0.34 is the first distribution with wheels that were generated using GitHub Actions (using cibuildwheel).
Windows
Linux
MacOS
Questions for everybody:
All feedback welcome.
I was an occasional user of pyodbc
when I filled https://github.com/mkleehammer/pyodbc/pull/966 and my motivation was that I want this stuff to just work without additional hassle, because installing dependencies was not the problem I was dealing with at that time.
Therefore I assume that if people choose wheels, they need unixODBC
the be wheeled too. If users will have to install system unixODBC
for fine tuning, they will likely have root privileges to install system version of python3-pyodbc
there, which probably also optimized. So users of system packages are not primary wheels users. And if they need a fresh version of pyodbc
, there is always --no-binary
option as it was before.
@abitrolly - We certainly appreciate your contribution! We just need to get "the devil in the details" sorted out.
I assume that if people choose wheels, they need unixODBC the be wheeled too.
If by "people" you mean end-users, the problem is that they don't choose wheels. pip chooses a wheel for them if a suitable one is available. (But I'm sure you knew that.) If by "people" you mean the pyodbc maintainers, then that's why we're having this discussion. 😄
If users will have to install system unixODBC for fine tuning, they will likely have root privileges to install system version of python3-pyodbc there, which probably also optimized.
The repositories for a given distro will undoubtedly have packages that are optimized for installation on that distro. The problem is that the repos often contain versions that are old – sometimes really old. On Ubuntu 20.24, sudo apt install python3-pyodbc
installs pyodbc 4.0.22. That version was released 4.5 years ago, and there have been 10 releases since then. Ubuntu 20.04 will be supported for another 8 years (April 2030). In my experience, it is pretty rare for packages like pyodbc to be updated in the repos, at least for Ubuntu.
Also worth mentioning that installing "ODBC Driver 17/18 for SQL Server"
sudo
privileges anyway.FWIW, I just tried using the "generic" wheel file I created from my ~/git/pyodbc folder on Ubuntu 20.24 via
python -m pip wheel .
to install pyodbc on Oracle Linux 9 (based on RHEL) and it seems to have worked fine. No errors from the current version of pip, and Python 3.9 has no complaints about import pyodbc
. Unfortunately I can't test msodbcsql because it hasn't been released for Oracle Linux 9 yet.
@gordthompson well, I certainly think that maintainers are people too. :D
For me the best strategy is to ship wheels with the latest version libodbc.so
compiled from source, That would complicate lines added to cibuildwheel
action, but the benefit is that manylinux
guarantees glibc
compatibility with a range of Linux kernel versions.
TBH, I don't know how much "manylinux" contributes to avoiding glibc
issues beyond building the wheels on an older Linux distribution. It does seem a bit ironic that we want to build wheels on an older Linux release for glibc
, but then we have problems with the dependent libodbc.so because it's … old.
https://cibuildwheel.readthedocs.io/en/stable/ should contain info how to avoid copying libs.
I didn't find anything in the cibuildwheel docs (or any docs per se for auditwheel), but I did find this:
https://stackoverflow.com/questions/67326886/can-i-exclude-libraries-from-auditwheel-repair
but then we have problems with the dependent libodbc.so because it's … old.
Why not to compile newer libodbc.so
for older Linux then?
I don't understand why configuration depends on binary if ABI is the same.
Different Linux distros may configure unixODBC differently. One of the more obvious differences that comes to mind is that SuSE puts odbcinst.ini and odbc.ini in /etc/unixODBC/ instead of /etc. If you ship a version with pyODBC that looks in /etc , then other ODBC drivers on a system where it is in a different place will not be found. This is just one of the confusions that can occur. Look in unixODBC's configure script for the other differences in which it can be configured. Thus my answers to Keith's questions 3 and 5 are a strong No.
@v-chojas why not standardize lookup location to remove configuration mess, instead of adding more mess?
As a DevOps, rather than caring about matrix of configuration locations across Linux versions, then incompatibilities between pyODBC
and unixODBC
and ODBC driver
versions, I would rather package tested (or latest) versions of ODBC drivers that are needed into wheels as well. So that I could have predictable deploy scenarios. Maybe conda
already does this.
What do you mean "standardize lookup location"? You are free to try to convince all the Linux distros to configure unixODBC in the same way, but I don't think that's going to help the immediate issue either.
ODBC was meant to be a system-wide component. Even disregarding the legal issues with redistribution, packaging ODBC drivers would be even worse, as that would cause certain drivers to be visible only to pyODBC and not the other ODBC applications on the system, or vice-versa. That causes massive confusion when debugging (e.g. how do you isolate pyODBC as being the cause, when trying to use the driver or DM from elsewhere doesn't work?) ODBC drivers often have their own dependencies too, which could conflict with those of the OS; do you propose to put those in there as well? Or pull the rest of the OS into it too...?
No.
ODBC was meant to be a system-wide component.
Now the trend in DevOps is to isolate all components into containers and virtual environments. To reduce compatibility hell and configuration hassle.
as that would cause certain drivers to be visible only to pyODBC and not the other ODBC applications on the system
For other ways to access databases, it is completely normal for Python drivers like psycopg2
to be visible only to Python apps. And if non-Python app component needs DB access, they use their own driver.
If I would need to debug connection problems with Python app, I would use the information on Python level that pyODBC
provides. If pyODBC
causes massive confusion when debugging, maybe it is the place that should be improved.
ODBC drivers often have their own dependencies too, which could conflict with those of the OS; do you propose to put those in there as well?
I believe https://github.com/pypa/auditwheel already detects which libs are needed and puts them into wheel to avoid any conflicts.
Or pull the rest of the OS into it too...?
I hope we still understand that the context is Linux, and Linux is kernel and packages, and if your database driver is bloatware that requires total OS to be packaged, instead of relying on standard glibc
interfaces and few megabytes of code, maybe it is spyware from a Windows world that you shouldn't be using in the first place.
to isolate all components into containers
If you want a container, then use a container.
And if non-Python app component needs DB access, they use their own driver.
That's not how ODBC is intended to work.
instead of relying on
Then why do you not think pyODBC should rely on the system unixODBC?
After all, didn't your idea already lead to several problems reported where there was not any before? For every one who makes an issue here, there are probably many others who either haven't upgraded to '34 yet, or encountered this issue and immediately went back.
Now the trend in DevOps is to isolate all components into containers and virtual environments.
Which in both cases is effectively "packaging an OS (environment)" and so the argument becomes somewhat circular. When you deploy that environment you can ensure that it has the required components (specifically, the ODBC DM). Also, unixODBC is not the only DM out there: iODBC is another option, and apparently some ODBC drivers are built to use that DM specifically. (But I must confess near-complete ignorance when it comes to iODBC.)
maybe it is spyware from a Windows world that you shouldn't be using in the first place
"spyware"? That seems a bit … paranoid.
And the irony is that Windows includes the ODBC DM "out of the box", making this a non-issue on that platform.
Then why do you not think pyODBC should rely on the system unixODBC?
I didn't say that I think that pyODBC
should rely on the system unixODBC
. What I say is that pyodbc
wheels can be self-sufficient without relying on system unixODBC
, and that will make the life of us DevOps much easier.
After all, didn't your idea already lead to https://github.com/mkleehammer/pyodbc/issues/1079 https://github.com/mkleehammer/pyodbc/issues/1081 https://github.com/mkleehammer/pyodbc/issues/1083 where there was not any before?
Then why don't you ask them if they want to rollback #966, so that everybody had to install pyodbc
by compiling? I proposed an alternative. If the alternative doesn't work, because pyodbc
doesn't see the ODCB config, I guess it is easier to add autodetector for ODBC config, than forcing users to deal with compilation issues. Maybe adding all ODBC drivers into wheel was not a good idea at all, but compiling fresh libodbc.so
there with config detector seems to be an achievable solution.
I never answered my own questions from above. Here are my thoughts:
brew install unixodbc
(which currently installs 2.3.11 by the way).Anyway, those are my thoughts. If anybody wants to see more of the builds that cibuildwheel can do, have a look at the artifacts here: https://github.com/keitherskine/pyodbc/actions/runs/2735953171
Linux wheels - Debian 9 or CentOS 7 (or both)?
CentOS 7 is the older one, so you will get wider compatibility if you build on it. In general, backwards-compatibility means that binaries built on an older system will work on a newer one, but not vice-versa; on the other hand, in contrast to Windows, it is much more difficult to do the opposite.
Just chipping in my $0.02 here: As a consumer of pyodbc, it would be slightly more convenient for me if it included all the binaries it needs rather than requiring me to compile them. Mostly this is just because it would simplify Docker builds, which is primarily where I use it.
Compiling packages from source in a Docker build is annoying, because if you don't want your final image to be bloated with a bunch of build dependencies that are completely unnecessary for the running application, you have to:
So in this case, the "install pyodbc" portion of my (Debian-based) Dockerfile looks like this:
RUN apt-get install -y build-essential unixodbc unixodbc-dev && \
pip install pyodbc && \
apt-get purge -y build-essential unixodbc-dev && \
apt-get autoremove -y
Which is somewhat complex and time-consuming.
However Docker does cache build layers, meaning this cost doesn't have to be paid on every build, so it's not completely the end of the world. I would rate this as "mildly inconvenient" at worst. Just wanted to chime in since it's under discussion.
@keitherskine
the last I heard Christoph Gohlke's department has lost funding so there is a serious possibility that he will not be able to provide builds on his website for much longer.
Unfortunately, that seems to be the case. The page title is now
Archived: Unofficial Windows Binaries for Python Extension Packages
and the latest version of pyodbc available there is 4.0.32.
Fixed in version 4.0.35 of pyodbc. Linux wheels no longer include any libs.
Environment
Issue
With 4.0.34 we started building manylinux wheel files. Unfortunately, they include rather old versions of libodbc.so and libltdl.so that override the system versions, leading to errors as described in #1079 and #1081.
pyodbc 4.0.34
Note:
pyodbc 4.0.32
Note:
Action Needed
Build wheels without including those libraries.
Workarounds for the time being
Pin pyodbc at version 4.0.32, or