pymupdf / PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
https://pymupdf.readthedocs.io
GNU Affero General Public License v3.0
5.42k stars 511 forks source link

Segmentation Fault When Updating PDF Form Field Value #4004

Open rdhyee opened 2 days ago

rdhyee commented 2 days ago

Description of the bug

When attempting to update a PDF form field value using widget.update(), the application crashes with a segmentation fault. The crash occurs specifically in the PDF annotation rectangle handling code.

How to reproduce the bug

  1. Create a PDF with a form field named "Text1" (I attach simple_form.pdf) simple_form.pdf

Run the following code:

import pymupdf as pmp
from collections import defaultdict
import faulthandler

# Enable fault handler for detailed crash reports
faulthandler.enable(file=open('fault.log', 'w'))

def get_widgets_by_name(doc):
    """
    Extracts and returns a dictionary of widgets indexed by their names.
    """
    widgets_by_name = defaultdict(list)
    for page_num in range(len(doc)):
        page = doc.load_page(page_num)
        for field in page.widgets():
            widgets_by_name[field.field_name].append({
                "page_num": page_num,
                "widget": field
            })
    return widgets_by_name

# Open document and get widgets
doc = pmp.open("simple_form.pdf")
widgets_by_name = get_widgets_by_name(doc)

# Print widget information
for name, widgets in widgets_by_name.items():
    print(f"Widget Name: {name}")
    for entry in widgets:
        widget = entry["widget"]
        page_num = entry["page_num"]
        print(f"  Page: {page_num + 1}, Type: {widget.field_type}, Value: {widget.field_value}, Rect: {widget.rect}")

# Attempt to update field value
w = widgets_by_name["Text1"][0]
field = w['widget']
field.value = "1234567890"
field.update()  # Crashes here

doc.close()

output of program:

Widget Name: Text1
  Page: 1, Type: 7, Value: , Rect: Rect(172.80099487304688, 117.16400146484375, 322.8009948730469, 139.16400146484375)
zsh: segmentation fault  python pymupdf_bug.py

Current Behavior

The program crashes with a segmentation fault when calling field.update(). The crash occurs in the PDF annotation rectangle handling code.

Crash Details

Stack trace from fault.log:

Fatal Python error: Segmentation fault

Current thread 0x00007ff84458ae00 (most recent call first):
  File ".../pymupdf/mupdf.py", line 51736 in pdf_set_annot_rect
  File ".../pymupdf/__init__.py", line 17613 in JM_set_widget_properties
  File ".../pymupdf/__init__.py", line 21686 in _save_widget
  File ".../pymupdf/__init__.py", line 7364 in update
  File "pymupdf_bug.py", line 51 in <module>

The crash trace indicates the following call chain:

  1. widget.update()
  2. _save_widget()
  3. JM_set_widget_properties()
  4. pdf_set_annot_rect()

Additional Context

See detailed crash from Console.app:

-------------------------------------
Translated Report (Full Report Below)
-------------------------------------

Process:               Python [96604]
Path:                  /Users/USER/*/Python.framework/Versions/3.12/Resources/Python.app/Contents/MacOS/Python
Identifier:            org.python.python
Version:               3.12.7 (3.12.7)
Code Type:             X86-64 (Native)
Parent Process:        zsh [57473]
Responsible:           iTerm2 [1138]
User ID:               501

Date/Time:             2024-10-29 16:32:11.8051 -0700
OS Version:            macOS 14.7 (23H124)
Report Version:        12
Bridge OS Version:     9.0 (22P353)
Anonymous UUID:        5855653E-F1B7-B5A9-6F5E-E45B72164E45

Sleep/Wake UUID:       FADABBF3-D42F-4F8D-A218-0E6F059B058C

Time Awake Since Boot: 510000 seconds
Time Since Wake:       32011 seconds

System Integrity Protection: enabled

Crashed Thread:        0  Dispatch queue: com.apple.main-thread

Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:       KERN_INVALID_ADDRESS at 0x0000000000010220
Exception Codes:       0x0000000000000001, 0x0000000000010220

Termination Reason:    Namespace SIGNAL, Code 11 Segmentation fault: 11
Terminating Process:   Python [96604]

VM Region Info: 0x10220 is not in any region.  Bytes before following region: 4306091488
      REGION TYPE                    START - END         [ VSIZE] PRT/MAX SHRMOD  REGION DETAIL
      UNUSED SPACE AT START
--->  
      __TEXT                      100aac000-100aad000    [    4K] r-x/r-x SM=COW  /Users/USER/*/Python.framework/Versions/3.12/Resources/Python.app/Contents/MacOS/Python

Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib              0x7ff80100dd96 __pthread_kill + 10
1   libsystem_pthread.dylib             0x7ff801046ebd pthread_kill + 262
2   libsystem_c.dylib                   0x7ff800f320a8 raise + 24
3   Python                                 0x10186da34 faulthandler_fatal_error + 500
4   libsystem_platform.dylib            0x7ff801075fdd _sigtramp + 29
5   ???                                            0x0 ???
6   libmupdfcpp.so                         0x10154bbe1 mupdf::ll_pdf_set_annot_rect(pdf_annot*, fz_rect) + 81
7   _mupdf.so                              0x10529d4b5 _wrap_pdf_set_annot_rect(_object*, _object*) + 165
8   Python                                 0x1017237fa cfunction_call + 138
9   Python                                 0x1016d5c22 _PyObject_MakeTpCall + 226
10  Python                                 0x1017e7cb9 _PyEval_EvalFrameDefault + 44185
11  Python                                 0x1017dcd6f PyEval_EvalCode + 207
12  Python                                 0x101845936 run_mod + 150
13  Python                                 0x101843e7f _PyRun_SimpleFileObject + 783
14  Python                                 0x10184394b _PyRun_AnyFileObject + 123
15  Python                                 0x101868286 Py_RunMain + 2438
16  Python                                 0x1018686e0 pymain_main + 320
17  Python                                 0x10186873b Py_BytesMain + 43
18  dyld                                0x7ff800cbb345 start + 1909

Thread 0 crashed with X86 Thread State (64-bit):
  rax: 0x0000000000000000  rbx: 0x000000000000000b  rcx: 0x00007fe6001cf9e8  rdx: 0x0000000000000000
  rdi: 0x0000000000000103  rsi: 0x000000000000000b  rbp: 0x00007fe6001cfa10  rsp: 0x00007fe6001cf9e8
   r8: 0x0000000000000000   r9: 0xcccccccccccccccd  r10: 0x00007ff84458ae00  r11: 0x0000000000000246
  r12: 0x0000000000000103  r13: 0x0000000000000014  r14: 0x00007ff84458ae00  r15: 0x0000000000000016
  rip: 0x00007ff80100dd96  rfl: 0x0000000000000246  cr2: 0x0000000000000000

Logical CPU:     0
Error Code:      0x02000148 
Trap Number:     133

Thread 0 instruction stream:
  00 48 89 e0 48 8b 4d 80-48 89 48 10 0f 10 85 70  .H..H.M.H.H....p
  ff ff ff 0f 11 00 48 8d-7d 88 e8 91 3d f8 ff 48  ......H.}...=..H
  89 e0 48 8b 4d 98 48 89-48 10 0f 10 45 88 0f 11  ..H.M.H.H...E...
  00 0f 28 45 d0 0f 28 4d-a0 e8 62 3b f8 ff 49 8b  ..(E..(M..b;..I.
  76 10 48 89 df 4c 89 fa-e8 03 d5 04 00 41 c7 46  v.H..L.......A.F
  20 01 00 00 00 49 8b 46-08 48 8b b0 98 00 00 00   ....I.F.H......
 [c7]86 20 02 01 00 01 00-00 00 48 89 df e8 4e 7b  .. .......H...N{ <==
  04 00 48 89 df e8 e6 49-f7 ff 85 c0 75 0e 48 81  ..H....I....u.H.
  c4 98 00 00 00 5b 41 5e-41 5f 5d c3 49 8b 46 08  .....[A^A_].I.F.
  48 8b b0 98 00 00 00 48-89 df e8 d1 7f 04 00 48  H......H.......H
  89 df e8 c9 4f f7 ff 66-0f 1f 84 00 00 00 00 00  ....O..f........
  55 48 89 e5 41 57 41 56-41 54 53 49 89 cf 41 89  UH..AWAVATSI..A.

Binary Images:
       0x104ed9000 -        0x105694fff _mupdf.so (*) <3b3374ed-965c-3c6c-9bc7-2ea9bef6f8d0> /Users/USER/*/_mupdf.so
       0x100ea5000 -        0x100ec4fff _extra.so (*) <85b5b65d-665b-388a-abff-bcb56045ec53> /Users/USER/*/_extra.so
       0x1035c9000 -        0x104dc0fff libmupdf.dylib (*) <6eb25f96-8374-3d80-aea2-b6999fec05ee> /Users/USER/*/libmupdf.dylib
       0x1014dd000 -        0x10156cfff libmupdfcpp.so (*) <62085687-aa38-31fb-be57-2e889a623774> /Users/USER/*/libmupdfcpp.so
       0x100e3f000 -        0x100e40fff grp.cpython-312-darwin.so (*) <a74511f8-11a7-3725-b9fd-3c2ebf5b28f5> /Users/USER/*/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/grp.cpython-312-darwin.so
       0x100e60000 -        0x100e65fff _struct.cpython-312-darwin.so (*) <86e8001a-ad39-32b1-930c-55fcd02f660f> /Users/USER/*/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/_struct.cpython-312-darwin.so
       0x100e49000 -        0x100e4dfff _lzma.cpython-312-darwin.so (*) <b00a34db-b6df-3074-8acb-51bfe84a3b33> /Users/USER/*/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/_lzma.cpython-312-darwin.so
       0x100e2a000 -        0x100e2cfff _bz2.cpython-312-darwin.so (*) <70db1805-3944-3b59-bdad-ac2cba10fd02> /Users/USER/*/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/_bz2.cpython-312-darwin.so
       0x100e31000 -        0x100e38fff zlib.cpython-312-darwin.so (*) <c98fa99e-9f98-3790-82b7-0fcdf0809698> /Users/USER/*/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/zlib.cpython-312-darwin.so
       0x100e01000 -        0x100e01fff _opcode.cpython-312-darwin.so (*) <c4ac6d98-e9c3-3a47-aee8-2de1b57ee289> /Users/USER/*/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/_opcode.cpython-312-darwin.so
       0x100e06000 -        0x100e0afff binascii.cpython-312-darwin.so (*) <25b7901a-d4d6-3e0e-95fb-aae1e3205a12> /Users/USER/*/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/binascii.cpython-312-darwin.so
       0x100e10000 -        0x100e1bfff math.cpython-312-darwin.so (*) <4afb1e0c-c8b3-3ba6-8eb7-e0edf886a136> /Users/USER/*/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/math.cpython-312-darwin.so
       0x101684000 -        0x1019b7fff org.python.python (3.12.7, (c) 2001-2023 Python Software Foundation.) <b35c85ba-964c-3a8d-b1e0-53f32e16280f> /Users/USER/*/Python.framework/Versions/3.12/Python
       0x100ad5000 -        0x100aecfff libintl.8.dylib (*) <62eaae82-cc20-36e7-84e1-873459bc4e8f> /usr/local/Cellar/gettext/0.22.5/lib/libintl.8.dylib
       0x100aac000 -        0x100aacfff org.python.python (3.12.7) <98aedc0e-156f-3c2f-82e0-454cdb557ddc> /Users/USER/*/Python.framework/Versions/3.12/Resources/Python.app/Contents/MacOS/Python
    0x7ff801006000 -     0x7ff801040ff7 libsystem_kernel.dylib (*) <2442268f-a168-398b-986d-f51a5b77ced1> /usr/lib/system/libsystem_kernel.dylib
    0x7ff801041000 -     0x7ff80104cff7 libsystem_pthread.dylib (*) <79ecab15-71f1-3d6a-8a96-0623e622205f> /usr/lib/system/libsystem_pthread.dylib
    0x7ff800eed000 -     0x7ff800f74ff7 libsystem_c.dylib (*) <17b641ba-925c-39a9-aa43-ab3b0bcdfe01> /usr/lib/system/libsystem_c.dylib
    0x7ff801072000 -     0x7ff80107cff7 libsystem_platform.dylib (*) <cf0d62bf-94ea-338b-81d2-4ef8161cdf4e> /usr/lib/system/libsystem_platform.dylib
               0x0 - 0xffffffffffffffff ??? (*) <00000000-0000-0000-0000-000000000000> ???
    0x7ff800cb5000 -     0x7ff800d4581f dyld (*) <3a3cc221-017e-30a8-a2d3-0db1b0e5d805> /usr/lib/dyld

External Modification Summary:
  Calls made by other processes targeting this process:
    task_for_pid: 0
    thread_create: 0
    thread_set_state: 0
  Calls made by this process:
    task_for_pid: 0
    thread_create: 0
    thread_set_state: 0
  Calls made by all processes on this machine:
    task_for_pid: 0
    thread_create: 0
    thread_set_state: 0

VM Region Summary:
ReadOnly portion of Libraries: Total=348.4M resident=0K(0%) swapped_out_or_unallocated=348.4M(100%)
Writable regions: Total=1.6G written=0K(0%) resident=0K(0%) swapped_out=0K(0%) unallocated=1.6G(100%)

                                VIRTUAL   REGION 
REGION TYPE                        SIZE    COUNT (non-coalesced) 
===========                     =======  ======= 
Kernel Alloc Once                    8K        1 
MALLOC                             1.6G       40 
MALLOC guard page                   24K        6 
Stack                             16.0M        1 
Stack Guard                          4K        1 
VM_ALLOCATE                       16.3M       18 
__DATA                            5465K      158 
__DATA_CONST                      6764K      113 
__DATA_DIRTY                       343K       58 
__LINKEDIT                       185.8M       17 
__OBJC_RO                         71.9M        1 
__OBJC_RW                         2201K        2 
__TEXT                           162.7M      171 
shared memory                       28K        4 
===========                     =======  ======= 
TOTAL                              2.0G      591 

-----------
Full Report
-----------

{"app_name":"Python","timestamp":"2024-10-29 16:32:12.00 -0700","app_version":"3.12.7","slice_uuid":"98aedc0e-156f-3c2f-82e0-454cdb557ddc","build_version":"3.12.7","platform":1,"bundleID":"org.python.python","share_with_app_devs":0,"is_first_party":0,"bug_type":"309","os_version":"macOS 14.7 (23H124)","roots_installed":0,"name":"Python","incident_id":"BC0BD3F2-A903-4888-BB22-2B3C880744ED"}
{
  "uptime" : 510000,
  "procRole" : "Unspecified",
  "version" : 2,
  "userID" : 501,
  "deployVersion" : 210,
  "modelCode" : "MacBookPro16,1",
  "coalitionID" : 1325,
  "osVersion" : {
    "train" : "macOS 14.7",
    "build" : "23H124",
    "releaseType" : "User"
  },
  "captureTime" : "2024-10-29 16:32:11.8051 -0700",
  "codeSigningMonitor" : 0,
  "incident" : "BC0BD3F2-A903-4888-BB22-2B3C880744ED",
  "pid" : 96604,
  "cpuType" : "X86-64",
  "roots_installed" : 0,
  "bug_type" : "309",
  "procLaunch" : "2024-10-29 16:32:11.4325 -0700",
  "procStartAbsTime" : 514090231633415,
  "procExitAbsTime" : 514090603749892,
  "procName" : "Python",
  "procPath" : "\/Users\/USER\/*\/Python.framework\/Versions\/3.12\/Resources\/Python.app\/Contents\/MacOS\/Python",
  "bundleInfo" : {"CFBundleShortVersionString":"3.12.7","CFBundleVersion":"3.12.7","CFBundleIdentifier":"org.python.python"},
  "storeInfo" : {"deviceIdentifierForVendor":"5BBEE140-1552-5656-B075-96B1FEA6A18E","thirdParty":true},
  "parentProc" : "zsh",
  "parentPid" : 57473,
  "coalitionName" : "com.googlecode.iterm2",
  "crashReporterKey" : "5855653E-F1B7-B5A9-6F5E-E45B72164E45",
  "responsiblePid" : 1138,
  "responsibleProc" : "iTerm2",
  "codeSigningID" : "org.python.python",
  "codeSigningTeamID" : "",
  "codeSigningFlags" : 536870913,
  "codeSigningValidationCategory" : 10,
  "codeSigningTrustLevel" : 4294967295,
  "wakeTime" : 32011,
  "bridgeVersion" : {"build":"22P353","train":"9.0"},
  "sleepWakeUUID" : "FADABBF3-D42F-4F8D-A218-0E6F059B058C",
  "sip" : "enabled",
  "vmRegionInfo" : "0x10220 is not in any region.  Bytes before following region: 4306091488\n      REGION TYPE                    START - END         [ VSIZE] PRT\/MAX SHRMOD  REGION DETAIL\n      UNUSED SPACE AT START\n--->  \n      __TEXT                      100aac000-100aad000    [    4K] r-x\/r-x SM=COW  \/Users\/USER\/*\/Python.framework\/Versions\/3.12\/Resources\/Python.app\/Contents\/MacOS\/Python",
  "exception" : {"codes":"0x0000000000000001, 0x0000000000010220","rawCodes":[1,66080],"type":"EXC_BAD_ACCESS","signal":"SIGSEGV","subtype":"KERN_INVALID_ADDRESS at 0x0000000000010220"},
  "termination" : {"flags":0,"code":11,"namespace":"SIGNAL","indicator":"Segmentation fault: 11","byProc":"Python","byPid":96604},
  "vmregioninfo" : "0x10220 is not in any region.  Bytes before following region: 4306091488\n      REGION TYPE                    START - END         [ VSIZE] PRT\/MAX SHRMOD  REGION DETAIL\n      UNUSED SPACE AT START\n--->  \n      __TEXT                      100aac000-100aad000    [    4K] r-x\/r-x SM=COW  \/Users\/USER\/*\/Python.framework\/Versions\/3.12\/Resources\/Python.app\/Contents\/MacOS\/Python",
  "extMods" : {"caller":{"thread_create":0,"thread_set_state":0,"task_for_pid":0},"system":{"thread_create":0,"thread_set_state":0,"task_for_pid":0},"targeted":{"thread_create":0,"thread_set_state":0,"task_for_pid":0},"warnings":0},
  "faultingThread" : 0,
  "threads" : [{"triggered":true,"id":14744380,"instructionState":{"instructionStream":{"bytes":[0,72,137,224,72,139,77,128,72,137,72,16,15,16,133,112,255,255,255,15,17,0,72,141,125,136,232,145,61,248,255,72,137,224,72,139,77,152,72,137,72,16,15,16,69,136,15,17,0,15,40,69,208,15,40,77,160,232,98,59,248,255,73,139,118,16,72,137,223,76,137,250,232,3,213,4,0,65,199,70,32,1,0,0,0,73,139,70,8,72,139,176,152,0,0,0,199,134,32,2,1,0,1,0,0,0,72,137,223,232,78,123,4,0,72,137,223,232,230,73,247,255,133,192,117,14,72,129,196,152,0,0,0,91,65,94,65,95,93,195,73,139,70,8,72,139,176,152,0,0,0,72,137,223,232,209,127,4,0,72,137,223,232,201,79,247,255,102,15,31,132,0,0,0,0,0,85,72,137,229,65,87,65,86,65,84,83,73,137,207,65,137],"offset":96}},"threadState":{"r13":{"value":20},"rax":{"value":0},"rflags":{"value":582},"cpu":{"value":0},"r14":{"value":140704275279360,"symbolLocation":0,"symbol":"_main_thread"},"rsi":{"value":11},"r8":{"value":0},"cr2":{"value":0},"rdx":{"value":0},"r10":{"value":140704275279360,"symbolLocation":0,"symbol":"_main_thread"},"r9":{"value":14757395258967641293},"r15":{"value":22},"rbx":{"value":11},"trap":{"value":133},"err":{"value":33554760},"r11":{"value":582},"rip":{"value":140703145450902,"matchesCrashFrame":1},"rbp":{"value":140625821104656},"rsp":{"value":140625821104616},"r12":{"value":259},"rcx":{"value":140625821104616},"flavor":"x86_THREAD_STATE","rdi":{"value":259}},"queue":"com.apple.main-thread","frames":[{"imageOffset":32150,"symbol":"__pthread_kill","symbolLocation":10,"imageIndex":15},{"imageOffset":24253,"symbol":"pthread_kill","symbolLocation":262,"imageIndex":16},{"imageOffset":282792,"symbol":"raise","symbolLocation":24,"imageIndex":17},{"imageOffset":2005556,"symbol":"faulthandler_fatal_error","symbolLocation":500,"imageIndex":12},{"imageOffset":16349,"symbol":"_sigtramp","symbolLocation":29,"imageIndex":18},{"imageOffset":0,"imageIndex":19},{"imageOffset":453601,"symbol":"mupdf::ll_pdf_set_annot_rect(pdf_annot*, fz_rect)","symbolLocation":81,"imageIndex":3},{"imageOffset":3949749,"symbol":"_wrap_pdf_set_annot_rect(_object*, _object*)","symbolLocation":165,"imageIndex":0},{"imageOffset":653306,"symbol":"cfunction_call","symbolLocation":138,"imageIndex":12},{"imageOffset":334882,"symbol":"_PyObject_MakeTpCall","symbolLocation":226,"imageIndex":12},{"imageOffset":1457337,"symbol":"_PyEval_EvalFrameDefault","symbolLocation":44185,"imageIndex":12},{"imageOffset":1412463,"symbol":"PyEval_EvalCode","symbolLocation":207,"imageIndex":12},{"imageOffset":1841462,"symbol":"run_mod","symbolLocation":150,"imageIndex":12},{"imageOffset":1834623,"symbol":"_PyRun_SimpleFileObject","symbolLocation":783,"imageIndex":12},{"imageOffset":1833291,"symbol":"_PyRun_AnyFileObject","symbolLocation":123,"imageIndex":12},{"imageOffset":1983110,"symbol":"Py_RunMain","symbolLocation":2438,"imageIndex":12},{"imageOffset":1984224,"symbol":"pymain_main","symbolLocation":320,"imageIndex":12},{"imageOffset":1984315,"symbol":"Py_BytesMain","symbolLocation":43,"imageIndex":12},{"imageOffset":25413,"symbol":"start","symbolLocation":1909,"imageIndex":20}]}],
  "usedImages" : [
  {
    "source" : "P",
    "arch" : "x86_64",
    "base" : 4377645056,
    "size" : 8110080,
    "uuid" : "3b3374ed-965c-3c6c-9bc7-2ea9bef6f8d0",
    "path" : "\/Users\/USER\/*\/_mupdf.so",
    "name" : "_mupdf.so"
  },
  {
    "source" : "P",
    "arch" : "x86_64",
    "base" : 4310323200,
    "size" : 131072,
    "uuid" : "85b5b65d-665b-388a-abff-bcb56045ec53",
    "path" : "\/Users\/USER\/*\/_extra.so",
    "name" : "_extra.so"
  },
  {
    "source" : "P",
    "arch" : "x86_64",
    "base" : 4351365120,
    "size" : 25133056,
    "uuid" : "6eb25f96-8374-3d80-aea2-b6999fec05ee",
    "path" : "\/Users\/USER\/*\/libmupdf.dylib",
    "name" : "libmupdf.dylib"
  },
  {
    "source" : "P",
    "arch" : "x86_64",
    "base" : 4316844032,
    "size" : 589824,
    "uuid" : "62085687-aa38-31fb-be57-2e889a623774",
    "path" : "\/Users\/USER\/*\/libmupdfcpp.so",
    "name" : "libmupdfcpp.so"
  },
  {
    "source" : "P",
    "arch" : "x86_64",
    "base" : 4309905408,
    "size" : 8192,
    "uuid" : "a74511f8-11a7-3725-b9fd-3c2ebf5b28f5",
    "path" : "\/Users\/USER\/*\/Python.framework\/Versions\/3.12\/lib\/python3.12\/lib-dynload\/grp.cpython-312-darwin.so",
    "name" : "grp.cpython-312-darwin.so"
  },
  {
    "source" : "P",
    "arch" : "x86_64",
    "base" : 4310040576,
    "size" : 24576,
    "uuid" : "86e8001a-ad39-32b1-930c-55fcd02f660f",
    "path" : "\/Users\/USER\/*\/Python.framework\/Versions\/3.12\/lib\/python3.12\/lib-dynload\/_struct.cpython-312-darwin.so",
    "name" : "_struct.cpython-312-darwin.so"
  },
  {
    "source" : "P",
    "arch" : "x86_64",
    "base" : 4309946368,
    "size" : 20480,
    "uuid" : "b00a34db-b6df-3074-8acb-51bfe84a3b33",
    "path" : "\/Users\/USER\/*\/Python.framework\/Versions\/3.12\/lib\/python3.12\/lib-dynload\/_lzma.cpython-312-darwin.so",
    "name" : "_lzma.cpython-312-darwin.so"
  },
  {
    "source" : "P",
    "arch" : "x86_64",
    "base" : 4309819392,
    "size" : 12288,
    "uuid" : "70db1805-3944-3b59-bdad-ac2cba10fd02",
    "path" : "\/Users\/USER\/*\/Python.framework\/Versions\/3.12\/lib\/python3.12\/lib-dynload\/_bz2.cpython-312-darwin.so",
    "name" : "_bz2.cpython-312-darwin.so"
  },
  {
    "source" : "P",
    "arch" : "x86_64",
    "base" : 4309848064,
    "size" : 32768,
    "uuid" : "c98fa99e-9f98-3790-82b7-0fcdf0809698",
    "path" : "\/Users\/USER\/*\/Python.framework\/Versions\/3.12\/lib\/python3.12\/lib-dynload\/zlib.cpython-312-darwin.so",
    "name" : "zlib.cpython-312-darwin.so"
  },
  {
    "source" : "P",
    "arch" : "x86_64",
    "base" : 4309651456,
    "size" : 4096,
    "uuid" : "c4ac6d98-e9c3-3a47-aee8-2de1b57ee289",
    "path" : "\/Users\/USER\/*\/Python.framework\/Versions\/3.12\/lib\/python3.12\/lib-dynload\/_opcode.cpython-312-darwin.so",
    "name" : "_opcode.cpython-312-darwin.so"
  },
  {
    "source" : "P",
    "arch" : "x86_64",
    "base" : 4309671936,
    "size" : 20480,
    "uuid" : "25b7901a-d4d6-3e0e-95fb-aae1e3205a12",
    "path" : "\/Users\/USER\/*\/Python.framework\/Versions\/3.12\/lib\/python3.12\/lib-dynload\/binascii.cpython-312-darwin.so",
    "name" : "binascii.cpython-312-darwin.so"
  },
  {
    "source" : "P",
    "arch" : "x86_64",
    "base" : 4309712896,
    "size" : 49152,
    "uuid" : "4afb1e0c-c8b3-3ba6-8eb7-e0edf886a136",
    "path" : "\/Users\/USER\/*\/Python.framework\/Versions\/3.12\/lib\/python3.12\/lib-dynload\/math.cpython-312-darwin.so",
    "name" : "math.cpython-312-darwin.so"
  },
  {
    "source" : "P",
    "arch" : "x86_64",
    "base" : 4318576640,
    "CFBundleShortVersionString" : "3.12.7, (c) 2001-2023 Python Software Foundation.",
    "CFBundleIdentifier" : "org.python.python",
    "size" : 3358720,
    "uuid" : "b35c85ba-964c-3a8d-b1e0-53f32e16280f",
    "path" : "\/Users\/USER\/*\/Python.framework\/Versions\/3.12\/Python",
    "name" : "Python",
    "CFBundleVersion" : "3.12.7"
  },
  {
    "source" : "P",
    "arch" : "x86_64",
    "base" : 4306325504,
    "size" : 98304,
    "uuid" : "62eaae82-cc20-36e7-84e1-873459bc4e8f",
    "path" : "\/usr\/local\/Cellar\/gettext\/0.22.5\/lib\/libintl.8.dylib",
    "name" : "libintl.8.dylib"
  },
  {
    "source" : "P",
    "arch" : "x86_64",
    "base" : 4306157568,
    "CFBundleShortVersionString" : "3.12.7",
    "CFBundleIdentifier" : "org.python.python",
    "size" : 4096,
    "uuid" : "98aedc0e-156f-3c2f-82e0-454cdb557ddc",
    "path" : "\/Users\/USER\/*\/Python.framework\/Versions\/3.12\/Resources\/Python.app\/Contents\/MacOS\/Python",
    "name" : "Python",
    "CFBundleVersion" : "3.12.7"
  },
  {
    "source" : "P",
    "arch" : "x86_64",
    "base" : 140703145418752,
    "size" : 241656,
    "uuid" : "2442268f-a168-398b-986d-f51a5b77ced1",
    "path" : "\/usr\/lib\/system\/libsystem_kernel.dylib",
    "name" : "libsystem_kernel.dylib"
  },
  {
    "source" : "P",
    "arch" : "x86_64",
    "base" : 140703145660416,
    "size" : 49144,
    "uuid" : "79ecab15-71f1-3d6a-8a96-0623e622205f",
    "path" : "\/usr\/lib\/system\/libsystem_pthread.dylib",
    "name" : "libsystem_pthread.dylib"
  },
  {
    "source" : "P",
    "arch" : "x86_64",
    "base" : 140703144267776,
    "size" : 557048,
    "uuid" : "17b641ba-925c-39a9-aa43-ab3b0bcdfe01",
    "path" : "\/usr\/lib\/system\/libsystem_c.dylib",
    "name" : "libsystem_c.dylib"
  },
  {
    "source" : "P",
    "arch" : "x86_64",
    "base" : 140703145861120,
    "size" : 45048,
    "uuid" : "cf0d62bf-94ea-338b-81d2-4ef8161cdf4e",
    "path" : "\/usr\/lib\/system\/libsystem_platform.dylib",
    "name" : "libsystem_platform.dylib"
  },
  {
    "size" : 0,
    "source" : "A",
    "base" : 0,
    "uuid" : "00000000-0000-0000-0000-000000000000"
  },
  {
    "source" : "P",
    "arch" : "x86_64",
    "base" : 140703141941248,
    "size" : 591904,
    "uuid" : "3a3cc221-017e-30a8-a2d3-0db1b0e5d805",
    "path" : "\/usr\/lib\/dyld",
    "name" : "dyld"
  }
],
  "sharedCache" : {
  "base" : 140703141244928,
  "size" : 25769803776,
  "uuid" : "0558adbc-51e6-35a7-9a10-a10a1291df47"
},
  "vmSummary" : "ReadOnly portion of Libraries: Total=348.4M resident=0K(0%) swapped_out_or_unallocated=348.4M(100%)\nWritable regions: Total=1.6G written=0K(0%) resident=0K(0%) swapped_out=0K(0%) unallocated=1.6G(100%)\n\n                                VIRTUAL   REGION \nREGION TYPE                        SIZE    COUNT (non-coalesced) \n===========                     =======  ======= \nKernel Alloc Once                    8K        1 \nMALLOC                             1.6G       40 \nMALLOC guard page                   24K        6 \nStack                             16.0M        1 \nStack Guard                          4K        1 \nVM_ALLOCATE                       16.3M       18 \n__DATA                            5465K      158 \n__DATA_CONST                      6764K      113 \n__DATA_DIRTY                       343K       58 \n__LINKEDIT                       185.8M       17 \n__OBJC_RO                         71.9M        1 \n__OBJC_RW                         2201K        2 \n__TEXT                           162.7M      171 \nshared memory                       28K        4 \n===========                     =======  ======= \nTOTAL                              2.0G      591 \n",
  "legacyInfo" : {
  "threadTriggered" : {
    "queue" : "com.apple.main-thread"
  }
},
  "logWritingSignature" : "f2c9c5886150fecc73563846441a726714952e84",
  "trialInfo" : {
  "rollouts" : [
    {
      "rolloutId" : "632c763c58740028737bfdd2",
      "factorPackIds" : {
        "SIRI_DIALOG_ASSETS" : "64a57d23fa6fd41b2353e2ae"
      },
      "deploymentId" : 240000034
    },
    {
      "rolloutId" : "6410af69ed1e1e7ab93ed169",
      "factorPackIds" : {

      },
      "deploymentId" : 240000011
    }
  ],
  "experiments" : [

  ]
}
}

PyMuPDF version

1.24.13

Operating system

MacOS

Python version

3.12

JorjMcKie commented 2 days ago

You are incorrectly accessing widgets after end-of-life of the owning page. The cause for the crash is that the code does not properly detect and prevent this logic error.

I understand your intention, but you must modify your approach. Do not store the widget object itself in any way. Store its properties (like name, value, xref, etc.) and its owning page number (also here: not the page object!) if this is required. Before updating, first load the page, then the desired field (via its xref), then change widget properties and update.

rdhyee commented 1 day ago

@JorjMcKie Thank you so much for responding to my issue and for sketching a proper approach. I see that you are working on a fix that would have caused my code to throw an error rather than to segfault. I'll now read through the docs to translate your hints into code.

rdhyee commented 1 day ago

Using the feedback from @JorjMcKie , here's what I came up with help from some code-writing AI

import pymupdf as pmp
from collections import defaultdict
from typing import Dict, List, Any, Optional

def get_widgets_info(doc: pmp.Document) -> Dict[str, List[Dict[str, Any]]]:
    """
    Extracts and returns a dictionary of widget information indexed by their names.

    Args:
        doc: PyMuPDF document object

    Returns:
        Dictionary mapping field names to lists of widget information dictionaries
    """
    widgets_by_name = defaultdict(list)
    for page_num in range(len(doc)):
        page = doc.load_page(page_num)
        for widget in page.widgets():
            widgets_by_name[widget.field_name].append({
                "page_num": page_num,
                "xref": widget.xref,
                "field_type": widget.field_type,
                "field_value": widget.field_value,
                "rect": widget.rect
            })
    return widgets_by_name

def update_widget_value(doc: pmp.Document, page_num: int, xref: int, new_value: str) -> bool:
    """
    Safely updates a widget's value by reloading the page and widget.

    Args:
        doc: PyMuPDF document object
        page_num: Page number containing the widget
        xref: Cross-reference number of the widget
        new_value: New value to set for the widget

    Returns:
        True if widget was successfully updated, False otherwise
    """
    try:
        page = doc.load_page(page_num)
        for widget in page.widgets():
            if widget.xref == xref:
                widget.field_value = new_value
                widget.update()
                return True
        return False
    except Exception as e:
        print(f"Error updating widget: {e}")
        return False

def main():
    """Main function to process the PDF form"""
    try:
        # Open document and get widgets info
        doc = pmp.open("simple_form.pdf")
        widgets_info = get_widgets_info(doc)

        # Print widget information
        for name, widgets in widgets_info.items():
            print(f"Widget Name: {name}")
            for widget_info in widgets:
                print(f"  Page: {widget_info['page_num'] + 1}, "
                      f"Type: {widget_info['field_type']}, "
                      f"Value: {widget_info['field_value']}, "
                      f"Rect: {widget_info['rect']}")

        # Update field value safely
        if "Text1" in widgets_info and widgets_info["Text1"]:
            widget_info = widgets_info["Text1"][0]
            success = update_widget_value(
                doc,
                widget_info["page_num"],
                widget_info["xref"],
                "1234567890"
            )

            if success:
                print("Widget updated successfully")
                doc.save("simple_form_filled.pdf", garbage=4, deflate=True)
            else:
                print("Failed to update widget")

    except Exception as e:
        print(f"Error processing PDF: {e}")

    finally:
        if 'doc' in locals():
            doc.close()

if __name__ == "__main__":
    main()
JorjMcKie commented 1 day ago

Fast reaction! Still one suggestion: simply load the widget directly: widget = page.load_widget(xref). No need to iterate ...

rdhyee commented 1 day ago

@JorjMcKie Fast reaction because I'm so excited that you responded to my cry for help so quickly -- and you got me unstuck! I ran into the segfault almost two weeks ago and only just got around to posting the issue yesterday. I'm so happy to be able to use PyMuPDF (along with PyPDF).

Thanks also for the telling me that I can load the widget directly.

JorjMcKie commented 1 day ago

My only question is: what for do you still need pypdf (🤷‍♂️😉)?

JorjMcKie commented 1 day ago

BTW thanks for the report: it pointed us to an open problem!

rdhyee commented 1 day ago

@JorjMcKie I started with PyMuPDF because I had read that it was the modern, fast library. After I ran into the segfault, I turned to PyPDF with the hope of eventually returning to PyMuPDF. So here I am.

One issue I couldn't get working with PyPDF is renaming widgets tied to the same field name into different names. I still haven't been able to successfully delete widgets using PyPDF. I'm hoping that I'll be able to use PyMuPDF to solve this problem.

JorjMcKie commented 1 day ago

@JorjMcKie I started with PyMuPDF because I had read that it was the modern, fast library. After I ran into the segfault, I turned to PyPDF with the hope of eventually returning to PyMuPDF. So here I am.

One issue I couldn't get working with PyPDF is renaming widgets tied to the same field name into different names. I still haven't been able to successfully delete widgets using PyPDF. I'm hoping that I'll be able to use PyMuPDF to solve this problem.

You can delete widgets with PyMuPDF. Renaming is non-trivial, because field names can belong to a hierarchy like "name1.name2.name3". Easy to imagine in which problems you run when you want to rename "name2". All is lower level kids must be adjusted and uniqueness throughout the full document must be guaranteed in addition ...

rdhyee commented 1 day ago

@JorjMcKie I'm a newbie when it comes to programmatically manipulating PDF files. One unpleasant surprise for me has been how fragile Adobe Acrobat Pro has been for editing form elements. I've been changing names and adding widgets and suddenly, the resulting file is corrupted and I loose all my edits. How can Adobe Acrobat, software that should be the closest to the canonical software for working with PDFs be so junky? I started out trying to use the JS programmatic interface in Acrobat to manipulate the PDF but have abandoned that approach. Happy to be digging into PyMuPDF now.

julian-smith-artifex-com commented 1 day ago

In PyMuPDF git, we now have a fix for the underlying SEGV. If an annotation is unbound from its parent page (for example if the pymupdf.Page object is deleted), and then one attempts an operation on the annotation that requires the page, we now raise a Python exception.

Unfortunately the fix requires a new release of MuPDF. So depending on MuPDF release timescales, it might not be in the next release pf PyMuPDF.