Open dgutman opened 6 months ago
How did you install pylibtiff? You mentioned you've used pylibtiff before. What has changed or is different from when it worked in the past?
so originally the pylibtiff was installed as a dependency for one of my other packages. Its been several months though since I've probably done much testing, so I am not sure if I still have a virtualenvironment anywhere with a working local version...
In my scenario above though I was just creating a new virtual environment, and installed pylibtiff directly from github via pip install git+https://github.com/pearu/pylibtiff.git.
Check the logs of that pip install and make sure it was completed successful (the extensions are built successfully). I don't think the cython extensions access libtiff C directly so I'm not sure the Cython extensions are actually the problem. Oh yeah you're using the ctypes...hhmmm.
Yeah I'm going to try and dig a bit deeper today with various versions, I was just surprised I was getting a segfault which makes it harder for me to debug, a bit outside of my normal comfort zone. Just hadn't seen anyone report anything similar.
If you run python and a snippet of your code with gdb (the C debugger) or some mac equivalent of strace, you might get some information about what exactly is causing the segfault. For example, some missing library that the libtiff library is trying to link to.
Executive summary: I think this can be fixed by making sure that argtypes is specified with non-variadic arguments for all variadic functions being called via ctypes.
Details-- I fell down the rabbit hole of this a bit and have a preliminary fix based on this being a result of calling ABI differences between ARM64 and Intel for variadic functions.
I was able to reproduce the same (or similar) issue with a venv derived from a MacPorts py311 build on an M2max Sonoma (macOS14) system with tiff 4.6.0 also from MacPorts. In particular it's crashing with this stack trace:
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 libtiff.6.dylib 0x106628c24 _TIFFVGetField + 1208
1 libtiff.6.dylib 0x106627020 TIFFGetField + 28
2 libffi.8.dylib 0x104fcc050 ffi_call_SYSV + 80
3 libffi.8.dylib 0x104fc9548 ffi_call_int + 1432
4 _ctypes.cpython-311-darwin.so 0x104fa8860 _ctypes_callproc + 788
5 _ctypes.cpython-311-darwin.so 0x104fa34ac PyCFuncPtr_call + 220
6 Python 0x1051578d0 _PyObject_MakeTpCall + 128
7 Python 0x105235804 _PyEval_EvalFrameDefault + 41960
8 Python 0x10522a7d8 PyEval_EvalCode + 168
9 Python 0x10527d7bc run_eval_code_obj + 84
10 Python 0x10527d720 run_mod + 112
11 Python 0x10527d560 pyrun_file + 148
12 Python 0x10527cfb0 _PyRun_SimpleFileObject + 268
13 Python 0x10527c948 _PyRun_AnyFileObject + 216
14 Python 0x105299504 pymain_run_file_obj + 220
15 Python 0x105298e44 pymain_run_file + 72
16 Python 0x1052986f8 Py_RunMain + 660
17 Python 0x105299860 Py_BytesMain + 40
18 dyld 0x19eaf20e0 start + 2360
The crash registers include "byte write Translation fault":
Thread 0 crashed with ARM Thread State (64-bit):
x0: 0x0000000106684680 x1: 0x0000000000000102 x2: 0x0000000000000000 x3: 0x000000010662876c
x4: 0x0000000106629be8 x5: 0x000000011782143a x6: 0x00000001054f2d98 x7: 0x0000000000000000
x8: 0x0000000000000008 x9: 0x0000000000000000 x10: 0x000000016b696348 x11: 0x0000000106628964
x12: 0x0000000000000064 x13: 0x0000000000000020 x14: 0x00000001053e5780 x15: 0x00000000ffff7dff
x16: 0x000000019ed17b68 x17: 0x000000019ecac54c x18: 0x0000000000000000 x19: 0x0000000106813a00
x20: 0x0000000106684680 x21: 0x0000000000000102 x22: 0x0000000000000000 x23: 0x0000000000000000
x24: 0x000000016b696538 x25: 0x0000000000000003 x26: 0x00000001071f4998 x27: 0x0000000000000003
x28: 0x000000016b6964f0 fp: 0x000000016b696310 lr: 0x0000000106628794
sp: 0x000000016b6962d0 pc: 0x0000000106628c24 cpsr: 0x80001000
far: 0x0000000000000000 esr: 0x92000046 (Data Abort) byte write Translation fault
A bit of web searching around led me here: https://github.com/python/cpython/issues/92892, which led to an update of the ctypes docs:
https://docs.python.org/3/library/ctypes.html#calling-variadic-functions
Digging around in the pylibtiff sources, I see that the GetField calling sequence isn't argtypes-defined since it's ultimately quite varied based on what's being getted:
libtiff.TIFFIsMSB2LSB.restype = ctypes.c_int
libtiff.TIFFIsMSB2LSB.argtypes = [TIFF]
# GetField and SetField arguments are dependent on the tag
libtiff.TIFFGetField.restype = ctypes.c_int
libtiff.TIFFSetField.restype = ctypes.c_int
libtiff.TIFFNumberOfStrips.restype = c_tstrip_t
Then, rereading the ctypes docs more closely:
On those platforms it is required to specify the argtypes attribute for the regular, non-variadic, function arguments:
So I then edited libtiff_ctypes.py in my venv to include the non-variadic arguments from looking at tiffio.h entries for TIFF*GetField.
# GetField and SetField arguments are dependent on the tag
libtiff.TIFFGetField.restype = ctypes.c_int
libtiff.TIFFGetField.argtypes = [TIFF, ctypes.c_uint32]
After that, my test program was able to GetField without segfault for things including:
print(f.GetField('BitsPerSample'))
print(f.GetField('ImageDescription'))
print(f.GetField('ImageWidth'))
I'm curious whether this also affects other arm64 systems like raspi4 but haven't gotten a viable build on the system I have available as of this writing.
Wow! Thanks for diving into the rabbit hole.. Always glad to know I wasn't just doing something fundamentally dumb on my end. I'm not familiar enough with libtiff to provide much additional insight. Would this be relatively easy to patch though in general, or will it require specifying argtypes for a huge number of parameters/functions.
I took a look through the code and it may only be two lines that need adding; and it should not impact compatibility with other architectures.
diff --git a/libtiff/libtiff_ctypes.py b/libtiff/libtiff_ctypes.py
index be13a3a..8e85346 100644
--- a/libtiff/libtiff_ctypes.py
+++ b/libtiff/libtiff_ctypes.py
@@ -1895,8 +1895,10 @@ libtiff.TIFFIsMSB2LSB.argtypes = [TIFF]
# GetField and SetField arguments are dependent on the tag
libtiff.TIFFGetField.restype = ctypes.c_int
+libtiff.TIFFGetField.argtypes = [TIFF, ctypes.c_uint32]
libtiff.TIFFSetField.restype = ctypes.c_int
+libtiff.TIFFSetField.argtypes = [TIFF, ctypes.c_uint32]
libtiff.TIFFNumberOfStrips.restype = c_tstrip_t
libtiff.TIFFNumberOfStrips.argtypes = [TIFF]
I had tested your update to the library, and It had fixed some of the errors my app was throwing.. so progress has been made. I had primarily been trying to access the metadata using the pylibtiff library with the changes you had made, and no issue. I believe now the issue that is throwing the segfault relates to actually trying to get a tile/field from the tiff file.
Process 31728 stopped
JPEGVGetField + 124 libtiff.6.dylib
JPEGVGetField:
-> 0x107c9a8d8 <+124>: str w9, [x10]
0x107c9a8dc <+128>: ldr x8, [x8, #0x518]
0x107c9a8e0 <+132>: ldr x9, [sp, #0x8]
0x107c9a8e4 <+136>: add x10, x9, #0x8
Target 0: (python) stopped.
Process 31728 launched: '/Users/dagutman/devel/BDSA-Schema-Wrangler/.venv/bin/python' (arm64)did't post the full stack trace..
@pearu Let's move the discussion of @rayg-ssec's PR (#179) here. In @rayg-ssec's comment above it sounds like what he read in the ctypes docs allows for what he is doing; if not exactly the same then maybe applied conditionally on platforms that need at least a non-variadic set of argtypes declared initially. However, it looks like @dgutman's testing still fails with #179 on a different tag being fetched. My assumption is that the code was working for single argument GetField
usage but when it required more input arguments then it starts to fail.
Is there a straightforward test case for the new segfault behavior? Is this reading content at a lower level from a JPEG-compressed TIFF image / image plane?
I have version 4.6.0 of libtiff installed on my Mac OSX running Venture 13.4. In another application that ultimately uses pylibtiff, I started getting a SegFault. After some initial digging, it seems like pylibtiff is causing the issue, although a bit stumped as I've used the library in the past.
Simply running tiffinfo from the command line (see below), I am able to view the basic info from the sample TIFF File. I am running python 3.11.4 in a clean virtual environment, and
Throws a seg fault... even just doing f.info() throws a seg fault...