Subsequent builds of apsw from the same source directory have broken COLUMN_METADATA test somehow

mgorny commented 2 years ago

I'm sorry but this one is a complete brain-cracker, and I've literally spent three hours trying to figure out what's the exact condition triggering this and I'm completely lost now.

Basically, it looks like this: if I build apsw first for Python 3.x, then for Python 3.y, with the following setup.cfg:

[build_ext]
enable=column_metadata

then I install the version for Python 3.y, remove the build directory, and the version built for Python 3.y fails the following test:

# python3.9 -m apsw.tests -f
                Python  /usr/bin/python3.9 sys.version_info(major=3, minor=9, micro=14, releaselevel='final', serial=0)
Testing with APSW file  /usr/lib/python3.9/site-packages/apsw/__init__.cpython-39-x86_64-linux-gnu.so
          APSW version  3.39.3.0
    SQLite lib version  3.39.3
SQLite headers version  3039003
    Using amalgamation  False
.......................E
======================================================================
ERROR: testCursor (__main__.APSW)
Check functionality of the cursor
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/apsw/tests.py", line 775, in testCursor
    has_full=any(o=="ENABLE_COLUMN_METADATA" or o.startswith("ENABLE_COLUMN_METADATA=") for o in apsw.compile_options) if apsw.using_amalgamation else hasattr(c, "description_full")
apsw.ExecutionCompleteError: Can't get description for statements that have completed execution

----------------------------------------------------------------------
Ran 24 tests in 4.008s

FAILED (errors=1)

but the first version built from the source tree seems to always work fine.

This is very persistent. After building for python3.8 as my user, I'm no longer able to get a successful build for any other Python version, even after removing the git clone and starting fresh. I've even tried strace to figure out if it's perhaps writing some files outside the build directory but couldn't find anything obvious. However, the isolated env used by our package manager somehow manages not to persist this between completely separate package builds, so if e.g. I do:

# PYTHON_TARGETS="python3_8 python3_9" emerge -v apsw

then py3.8 is fine and py3.9 is broken, and if I afterwards do:

# PYTHON_TARGETS="python3_9 python3_10" emerge -v apsw

then py3.9 is fine and py3.10 is broken.

I've been able to reproduce it locally using (but beware, it tainted my env and I can't figure out how to fix it!):

$ python3.8 -m build -nw
$ python3.9 -m build -nw
# the produced dist/*cp39*.whl is broken now

(note that you need to add setup.cfg first)

Could you please help me figuring this out?

rogerbinns commented 2 years ago

Apologies for you experiencing this. Note that I do build out of the same directory against all different Python versions as part of my testing, without even a clean.

The version numbers and the "Testing with APSW file" seem consistent. That is usually what caught me out in the past where a system apsw ended up being tested against the tests in the local checkout, and fail due to version mismatch.

Doing some debugging, what is happening is that hasattr(cursor, "description_full") is actually calling the C implemented cursor getattr behind the scenes, which is then raising an exception because there is no currently executing statement. I have no idea why checking if an attribute exists is getting the attribute value, and why you are the only one seeing this.

In any event, a quick fix for you is to add a line preceding the test failure line:

c.execute("select 3")

That will ensure there is a partially executed statement, so getting the description won't fail.

Also note that you probably don't want the setup.cfg with enable=column_metadata. Instead I recommend you use the –use-system-sqlite-config build_ext option which will extract the compilation options from the system SQLite library and hence guarantee they match. Some doc

mgorny commented 2 years ago

Apologies for you experiencing this. Note that I do build out of the same directory against all different Python versions as part of my testing, without even a clean.

The version numbers and the "Testing with APSW file" seem consistent. That is usually what caught me out in the past where a system apsw ended up being tested against the tests in the local checkout, and fail due to version mismatch.

Doing some debugging, what is happening is that hasattr(cursor, "description_full") is actually calling the C implemented cursor getattr behind the scenes, which is then raising an exception because there is no currently executing statement. I have no idea why checking if an attribute exists is getting the attribute value, and why you are the only one seeing this.

Thanks for understanding. I can't figure out why I'm hitting this either — but definitely there's something wrong happening here, that probably causes the built C extension to be different in n>0 builds than in n==0 build. However, I have no clue how to reasonably "diff" to built file.

In any event, a quick fix for you is to add a line preceding the test failure line:
c.execute("select 3")
That will ensure there is a partially executed statement, so getting the description won't fail.

Thanks, this seems to cause the test to pass indeed. Still, it would be nice to know what's causing this weird behavior in the first place.

Also note that you probably don't want the setup.cfg with enable=column_metadata. Instead I recommend you use the –use-system-sqlite-config build_ext option which will extract the compilation options from the system SQLite library and hence guarantee they match. Some doc

Yes, I've originally tried to use that option. However, I've switched to enable to attempt to produce a minimal reproducer.

rogerbinns commented 2 years ago

I think I have found the cause. getattr being called when hasattr was used is only done by some CPython versions. However when it is done, that line of the test will always fail if the amalgamation is not used (ie external SQLite library is). My testing with enable_column_metadata and using external SQLite just happened to not hit the CPython versions that fail. Similarly your first build wasn't a vulnerable CPython version, and the second was.

I'm updating my all-the-version-with-all-the-configs testing to include this for all supported Python versions, to confirm and prevent anything similar in the future.

rogerbinns commented 2 years ago

I'm closing this now, as the fundamental problem was it should have been failing more, not less. It is also possible the hasattr code calling getattr was ignoring all exceptions instead of just AttributeError (the APSW code is in C so that complicates things compared to regular python code). In any event I now test all python versions with --use-system-sqlite-config so any similar issue in the future will be caught.

The one line fix will be part of the next release, which will be shortly after the next SQLite release. However I can change things if that is too inconvenient for you.

mgorny commented 2 years ago

Thanks a lot. I'm still a little bit worried why wasn't it failing when I built py3.9 as the only impl via the Gentoo package (it may mean that there's some deeper issue at hand) but I don't have the willpower or time to investigate it further. Waiting until the next release is good with me, I can backport the patch to Gentoo easily.

rogerbinns / apsw

Subsequent builds of apsw from the same source directory have broken COLUMN_METADATA test somehow #370