mkleehammer / pyodbc

Python ODBC bridge
https://github.com/mkleehammer/pyodbc/wiki
MIT No Attribution
2.95k stars 563 forks source link

Fetchdictarray rebased #1270

Closed ndmlny-qs closed 5 months ago

ndmlny-qs commented 1 year ago

Rebased against master. This is the PR to allow numpy arrays.

ndmlny-qs commented 1 year ago

The errors are coming from when we try to build pyodbc with numpy support. Since setuptools does not have access to a locally installed version of numpy during the build, it errors stating that no module named numpy can be found.

I've looked into how not to make numpy a required dependency in the pyproject.toml file for this feature, and I've come up empty. I'm look at how Pillow implements functional extensions using a custom build backend, reference is here https://github.com/python-pillow/Pillow/pull/7171/files#diff-14b169061c0b607511e83931a16fa2059738b12503c447588f81dd4775c59633. This is also discussed in the documentation for setuptools here https://setuptools.pypa.io/en/latest/build_meta.html. I need to do more investigation, however, I think when we build pyodbc and pip is given a flag like --config-settings=WITH_NUMPY=1, then the custom build backend will install numpy so that you can use this feature.

@mkleehammer any thoughts or concerns about going this route?

mkleehammer commented 1 year ago

My first question is would a numpy version of pyodbc even be able to load if numpy is not installed? That is, does it link with libraries that need to be present. That would be a problem.

I wondered if it would be easier to write a separate library like pyodbc_numpy that pyodbc could pass internals to.

ndmlny-qs commented 1 year ago

My first question is would a numpy version of pyodbc even be able to load if numpy is not installed?

I need to modify a few things, but the advice I was given was to ensure pyodbc functions without NumPy, even if it was compiled with NumPy support. Then make NumPy a build requirement in the pyrpoject.toml file, but an optional runtime dependency. Finally, I need to ensure there are two configs for the tests suite; one that works without NumPy, and that works with it.

This would keep the project from fracturing into two packages, and is something I'd like to try.

ndmlny-qs commented 1 year ago

@mkleehammer I have this branch successfully building to pyodbc 5.0.0b4. There are tests that should be made for when a user optionally installs with numpy, i.e. python -m pip install --force-reinstall .[numpy] for a dev install, or pip install pyodbc[numpy] for a user install, but I think this feature is okay for review now.

ndmlny-qs commented 1 year ago

This commit will successfully build pyodbc with NumPy as a build requirement, and an optional runtime dependency. To test the build, I am using what was posted in a discussion here; https://github.com/mkleehammer/pyodbc/discussions/1253. This will start a local Microsoft SQL 2017 server that we can query against.

Installing pyodbc without NumPy gets the expected behavior of a ModuleNotFoundError propagating from the C++ code. Installing pyodbc from this branch without numpy

pip install --force-reinstall .

The following Python code will create the expected error behavior, and will not crash an (ipython or python) REPL nor will it segfault.

# test.py
import pyodbc

connection_string = (
    "DRIVER={ODBC Driver 17 for SQL Server};SERVER=127.0.0.1,1401;"
    "UID=sa;PWD=StrongPassword2017;DATABASE=test"
)
connection = pyodbc.connect(connection_string)
cursor = connection.cursor()
cursor.execute("SELECT name, database_id, create_date FROM sys.databases;")
cursor.fetchdictarray()
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[8], line 1
----> 1 cursor.fetchdictarray()

ModuleNotFoundError: No module named 'numpy'

With NumPy support as an optional dependency

pip install --force-reinstall .[numpy]

and using the same test.py results in the following returned results (in an ipython session)

{'name': array(['master', 'tempdb', 'model', 'msdb', 'test'], dtype='<U129'),
 'database_id': array([1, 2, 3, 4, 5], dtype=int32),
 'create_date': array(['2003-04-08T09:13:36.390000', '2023-10-18T15:44:45.917000',
        '2003-04-08T09:13:36.390000', '2023-01-25T10:14:45.980000',
        '2023-10-18T15:45:14.713000'], dtype='datetime64[us]')}

If a user wants to install with numpy support, then using pip they would need to run the command pip install pyodbc[numpy].

ndmlny-qs commented 1 year ago

Looks like there is still work to be done for getting Windows builds, so I will continue to investigate.

ndmlny-qs commented 1 year ago

@mkleehammer I had to modify the AppVeyor build to no longer include --no-isolation since the pyproject.toml file has a build-time dependency on numpy, while it is a run-time optional dependency. Also, the CodeQL CI was not building pyodbc, so I modified the file to do the build exactly as is in the github action.

This looks to be ready for review. I'll keep monitoring the PR for any required changes for anyone that reviews.

ndmlny-qs commented 5 months ago

@mkleehammer after careful consideration we have decided to build out this functionality as a new project. We are closing out this PR and moving our numpy-related efforts to a new project called npyodbc, which layers additional numpy related functionality on top of pyodbc. If there is a desire in the future to re-merge, please feel free to reach out. Thank you for all your help over the years, with your permission, we will include attribution and a link to this project in our new repo. Please let us know if you have any thoughts or concerns as we move in this direction.