scrapy / parsel

Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors
BSD 3-Clause "New" or "Revised" License
1.13k stars 144 forks source link

Parsel import causes crash #294

Closed barrio closed 5 months ago

barrio commented 5 months ago

An empty Python script with just the statement 'import parsel' causes a SIGILL. According to the backtrack it is caused by lxml, but 'import lxml' works fine. So I file a report here first. On another machine there is no crash, hardware infos are below the BT:

`Program received signal SIGILL, Illegal instruction. 0x00007ffff707a4e3 in ?? () from /home/barrios/gg-deals/itrax-env/lib/python3.12/site-packages/lxml/etree.cpython-312-x86_64-linux-gnu.so

0 0x00007ffff707a4e3 in ?? () from /home/barrios/gg-deals/itrax-env/lib/python3.12/site-packages/lxml/etree.cpython-312-x86_64-linux-gnu.so

--Type for more, q to quit, c to continue without paging--

1 0x00007ffff708056f in ?? () from /home/barrios/gg-deals/itrax-env/lib/python3.12/site-packages/lxml/etree.cpython-312-x86_64-linux-gnu.so

2 0x0000000000582bef in PyModule_ExecDef (module=0x7ffff77d8e50, def=) at ../Objects/moduleobject.c:440

3 0x00000000005fda14 in _imp_exec_dynamic_impl (mod=, module=) at ../Python/import.c:3799

4 _imp_exec_dynamic (module=, mod=) at ../Python/clinic/import.c.h:534

5 0x0000000000581f32 in cfunction_vectorcall_O (func=0x7ffff7b97330, args=0x7ffff779be98, nargsf=, kwnames=)

at ../Include/cpython/methodobject.h:50

6 0x00000000005db593 in _PyEval_EvalFrameDefault (tstate=, frame=, throwflag=) at Python/bytecodes.c:3254

7 0x0000000000549d27 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=2, args=0x7fffffffcc00, callable=0x7ffff7ba4040, tstate=0xba5048 <_PyRuntime+459656>)

at ../Include/internal/pycore_call.h:92

8 object_vacall (tstate=tstate@entry=0xba5048 <_PyRuntime+459656>, base=, callable=0x7ffff7ba4040, vargs=0x7fffffffcc88) at ../Objects/call.c:850

9 0x000000000054b5b3 in PyObject_CallMethodObjArgs (obj=, name=) at ../Objects/call.c:911

10 0x00000000005fde85 in import_find_and_load (abs_name=0x7ffff77afc70, tstate=0xba5048 <_PyRuntime+459656>) at ../Python/import.c:2779

11 PyImport_ImportModuleLevelObject (name=name@entry=0x7ffff77afc70, globals=globals@entry=0x0, locals=locals@entry=0x0, fromlist=fromlist@entry=0x0, level=0)

at ../Python/import.c:2862

12 0x00000000005d3874 in builtinimportimpl (level=, fromlist=0x0, locals=0x0, globals=0x0, name=0x7ffff77afc70, module=)

at ../Python/bltinmodule.c:275

13 builtin_import (module=, args=, nargs=, kwnames=) at ../Python/clinic/bltinmodule.c.h:107

14 0x0000000000581e9d in cfunction_vectorcall_FASTCALL_KEYWORDS (func=0x7ffff7b959e0, args=0x7ffff7bfdd08, nargsf=, kwnames=0x0)

at ../Include/cpython/methodobject.h:50

15 0x00000000005db593 in _PyEval_EvalFrameDefault (tstate=, frame=, throwflag=) at Python/bytecodes.c:3254

16 0x0000000000549d27 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=3, args=0x7fffffffd070, callable=0x7ffff7ba4180, tstate=0xba5048 <_PyRuntime+459656>)

at ../Include/internal/pycore_call.h:92

17 object_vacall (tstate=tstate@entry=0xba5048 <_PyRuntime+459656>, base=, callable=0x7ffff7ba4180, vargs=0x7fffffffd0f8) at ../Objects/call.c:850

18 0x000000000054b5b3 in PyObject_CallMethodObjArgs (obj=, name=) at ../Objects/call.c:911

19 0x00000000005fe0db in PyImport_ImportModuleLevelObject (name=name@entry=0x7ffff7bfdf20, globals=, locals=locals@entry=0x7ffff7706a80,

fromlist=fromlist@entry=0x7ffff7bfdc30, level=0) at ../Python/import.c:2931

20 0x00000000005dc8ee in import_name (level=0xb35988 <_PyRuntime+3272>, fromlist=0x7ffff7bfdc30, name=0x7ffff7bfdf20, frame=, tstate=)

at ../Python/ceval.c:2482

21 _PyEval_EvalFrameDefault (tstate=tstate@entry=0xba5048 <_PyRuntime+459656>, frame=, frame@entry=0x7ffff7e1b828, throwflag=throwflag@entry=0)

at Python/bytecodes.c:2135

22 0x00000000005d59ab in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff7e1b828, tstate=0xba5048 <_PyRuntime+459656>) at ../Include/internal/pycore_ceval.h:89

23 _PyEval_Vector (kwnames=0x0, argcount=0, args=0x0, locals=0x7ffff7706a80, func=0x7ffff771cea0, tstate=0xba5048 <_PyRuntime+459656>) at ../Python/ceval.c:1683

24 PyEval_EvalCode (co=co@entry=0x7ffff7b80030, globals=globals@entry=0x7ffff7706a80, locals=locals@entry=0x7ffff7706a80) at ../Python/ceval.c:578

25 0x00000000005d352c in builtin_exec_impl (module=, closure=, locals=0x7ffff7706a80, globals=0x7ffff7706a80, source=0x7ffff7b80030)

at ../Python/bltinmodule.c:1096

26 builtin_exec (module=, args=, nargs=, kwnames=) at ../Python/clinic/bltinmodule.c.h:586

27 0x0000000000581e9d in cfunction_vectorcall_FASTCALL_KEYWORDS (func=0x7ffff7b95e40, args=0x7ffff7720698, nargsf=, kwnames=0x0)

at ../Include/cpython/methodobject.h:50

28 0x00000000005db593 in _PyEval_EvalFrameDefault (tstate=, frame=, throwflag=) at Python/bytecodes.c:3254

--Type for more, q to quit, c to continue without paging--

29 0x0000000000549d27 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=2, args=0x7fffffffd6e0, callable=0x7ffff7ba4040, tstate=0xba5048 <_PyRuntime+459656>)

at ../Include/internal/pycore_call.h:92

30 object_vacall (tstate=tstate@entry=0xba5048 <_PyRuntime+459656>, base=, callable=0x7ffff7ba4040, vargs=0x7fffffffd768) at ../Objects/call.c:850

31 0x000000000054b5b3 in PyObject_CallMethodObjArgs (obj=, name=) at ../Objects/call.c:911

32 0x00000000005fde85 in import_find_and_load (abs_name=0x7ffff7707d30, tstate=0xba5048 <_PyRuntime+459656>) at ../Python/import.c:2779

33 PyImport_ImportModuleLevelObject (name=name@entry=0x7ffff7707d30, globals=globals@entry=0x0, locals=locals@entry=0x0, fromlist=fromlist@entry=0x0, level=0)

at ../Python/import.c:2862

34 0x00000000005d3874 in builtinimportimpl (level=, fromlist=0x0, locals=0x0, globals=0x0, name=0x7ffff7707d30, module=)

at ../Python/bltinmodule.c:275

35 builtin_import (module=, args=, nargs=, kwnames=) at ../Python/clinic/bltinmodule.c.h:107

36 0x0000000000581e9d in cfunction_vectorcall_FASTCALL_KEYWORDS (func=0x7ffff7b959e0, args=0x7ffff7bfc0b8, nargsf=, kwnames=0x0)

at ../Include/cpython/methodobject.h:50

37 0x00000000005db593 in _PyEval_EvalFrameDefault (tstate=, frame=, throwflag=) at Python/bytecodes.c:3254

38 0x0000000000549d27 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=3, args=0x7fffffffdb50, callable=0x7ffff7ba4180, tstate=0xba5048 <_PyRuntime+459656>)

at ../Include/internal/pycore_call.h:92

39 object_vacall (tstate=tstate@entry=0xba5048 <_PyRuntime+459656>, base=, callable=0x7ffff7ba4180, vargs=0x7fffffffdbd8) at ../Objects/call.c:850

40 0x000000000054b5b3 in PyObject_CallMethodObjArgs (obj=, name=) at ../Objects/call.c:911

41 0x00000000005fe0db in PyImport_ImportModuleLevelObject (name=name@entry=0x7ffff7bfd380, globals=, locals=locals@entry=0x7ffff7706700,

fromlist=fromlist@entry=0x7ffff7b9b190, level=0) at ../Python/import.c:2931

42 0x00000000005dc8ee in import_name (level=0xb35988 <_PyRuntime+3272>, fromlist=0x7ffff7b9b190, name=0x7ffff7bfd380, frame=, tstate=)

at ../Python/ceval.c:2482

43 _PyEval_EvalFrameDefault (tstate=tstate@entry=0xba5048 <_PyRuntime+459656>, frame=, frame@entry=0x7ffff7e1b378, throwflag=throwflag@entry=0)

at Python/bytecodes.c:2135

44 0x00000000005d59ab in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff7e1b378, tstate=0xba5048 <_PyRuntime+459656>) at ../Include/internal/pycore_ceval.h:89

45 _PyEval_Vector (kwnames=0x0, argcount=0, args=0x0, locals=0x7ffff7706700, func=0x7ffff771ccc0, tstate=0xba5048 <_PyRuntime+459656>) at ../Python/ceval.c:1683

46 PyEval_EvalCode (co=co@entry=0x7ffff7bcc4f0, globals=globals@entry=0x7ffff7706700, locals=locals@entry=0x7ffff7706700) at ../Python/ceval.c:578

47 0x00000000005d352c in builtin_exec_impl (module=, closure=, locals=0x7ffff7706700, globals=0x7ffff7706700, source=0x7ffff7bcc4f0)

at ../Python/bltinmodule.c:1096

48 builtin_exec (module=, args=, nargs=, kwnames=) at ../Python/clinic/bltinmodule.c.h:586

49 0x0000000000581e9d in cfunction_vectorcall_FASTCALL_KEYWORDS (func=0x7ffff7b95e40, args=0x7ffff77048d8, nargsf=, kwnames=0x0)

at ../Include/cpython/methodobject.h:50

50 0x00000000005db593 in _PyEval_EvalFrameDefault (tstate=, frame=, throwflag=) at Python/bytecodes.c:3254

51 0x0000000000549d27 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=2, args=0x7fffffffe1c0, callable=0x7ffff7ba4040, tstate=0xba5048 <_PyRuntime+459656>)

at ../Include/internal/pycore_call.h:92

52 object_vacall (tstate=tstate@entry=0xba5048 <_PyRuntime+459656>, base=, callable=0x7ffff7ba4040, vargs=0x7fffffffe248) at ../Objects/call.c:850

53 0x000000000054b5b3 in PyObject_CallMethodObjArgs (obj=, name=) at ../Objects/call.c:911

54 0x00000000005fde85 in import_find_and_load (abs_name=0x7ffff7bfd380, tstate=0xba5048 <_PyRuntime+459656>) at ../Python/import.c:2779

55 PyImport_ImportModuleLevelObject (name=name@entry=0x7ffff7bfd380, globals=, locals=locals@entry=0x7ffff7bf9a40,

fromlist=fromlist@entry=0xa3f8a0 <_Py_NoneStruct>, level=0) at ../Python/import.c:2862

--Type for more, q to quit, c to continue without paging--c

56 0x00000000005dc8ee in import_name (level=0xb35988 <_PyRuntime+3272>, fromlist=0xa3f8a0 <_Py_NoneStruct>, name=0x7ffff7bfd380, frame=,

tstate=<optimized out>) at ../Python/ceval.c:2482

57 _PyEval_EvalFrameDefault (tstate=tstate@entry=0xba5048 <_PyRuntime+459656>, frame=, frame@entry=0x7ffff7e1b020, throwflag=throwflag@entry=0)

at Python/bytecodes.c:2135

58 0x00000000005d59ab in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff7e1b020, tstate=0xba5048 <_PyRuntime+459656>) at ../Include/internal/pycore_ceval.h:89

59 _PyEval_Vector (kwnames=0x0, argcount=0, args=0x0, locals=0x7ffff7bf9a40, func=0x7ffff7bda160, tstate=0xba5048 <_PyRuntime+459656>) at ../Python/ceval.c:1683

60 PyEval_EvalCode (co=co@entry=0x7ffff7bedb00, globals=globals@entry=0x7ffff7bf9a40, locals=locals@entry=0x7ffff7bf9a40) at ../Python/ceval.c:578

61 0x0000000000608ac2 in run_eval_code_obj (locals=0x7ffff7bf9a40, globals=0x7ffff7bf9a40, co=0x7ffff7bedb00, tstate=0xba5048 <_PyRuntime+459656>)

at ../Python/pythonrun.c:1722

62 run_mod (mod=, filename=, globals=0x7ffff7bf9a40, locals=0x7ffff7bf9a40, flags=, arena=)

at ../Python/pythonrun.c:1743

63 0x00000000006b4d83 in pyrun_file (fp=fp@entry=0xbf43c0, filename=filename@entry=0x7ffff7ba8850, start=start@entry=257, globals=globals@entry=0x7ffff7bf9a40,

locals=locals@entry=0x7ffff7bf9a40, closeit=closeit@entry=1, flags=0x7fffffffe728) at ../Python/pythonrun.c:1643

64 0x00000000006b4aea in _PyRun_SimpleFileObject (fp=fp@entry=0xbf43c0, filename=filename@entry=0x7ffff7ba8850, closeit=closeit@entry=1,

flags=flags@entry=0x7fffffffe728) at ../Python/pythonrun.c:433

65 0x00000000006b491f in _PyRun_AnyFileObject (fp=0xbf43c0, filename=filename@entry=0x7ffff7ba8850, closeit=closeit@entry=1, flags=flags@entry=0x7fffffffe728)

at ../Python/pythonrun.c:78

66 0x00000000006bc9c5 in pymain_run_file_obj (skip_source_first_line=0, filename=0x7ffff7ba8850, program_name=0x7ffff7b2e2b0) at ../Modules/main.c:360

67 pymain_run_file (config=0xb47c28 <_PyRuntime+77672>) at ../Modules/main.c:379

68 pymain_run_python (exitcode=0x7fffffffe71c) at ../Modules/main.c:629

69 Py_RunMain () at ../Modules/main.c:709

70 0x00000000006bc4ad in Py_BytesMain (argc=, argv=) at ../Modules/main.c:763

71 0x00007ffff7c2a1ca in __libc_start_call_main (main=main@entry=0x518a50
, argc=argc@entry=2, argv=argv@entry=0x7fffffffe968)

at ../sysdeps/nptl/libc_start_call_main.h:58

72 0x00007ffff7c2a28b in __libc_start_main_impl (main=0x518a50
, argc=2, argv=0x7fffffffe968, init=, fini=,

rtld_fini=<optimized out>, stack_end=0x7fffffffe958) at ../csu/libc-start.c:360

73 0x0000000000657925 in _start ()

A debugging session is active.

Inferior 1 [process 12487] will be killed.`

System: Host: itrax Kernel: 6.8.0-31-generic arch: x86_64 bits: 64 Console: pty pts/0 Distro: Ubuntu 24.04 LTS (Noble Numbat) Machine: Type: Desktop System: FUJITSU product: FUTRO S900 v: N/A serial: Mobo: FUJITSU model: D3003-A1 v: S26361-D3003-A1 serial: BIOS: FUJITSU // American Megatrends v: 4.6.4.1 R1.14.0 for D3003-A1x date: 01/27/2012 CPU: Info: single core model: AMD G-T44R bits: 64 type: UP cache: L2: 512 KiB Speed (MHz): 798 min/max: 800/1200 core: 1: 798 Graphics: Device-1: AMD Wrestler [Radeon HD 6250] driver: radeon v: kernel Display: server: No display server data found. Headless machine? tty: 168x40 API: EGL v: 1.5 drivers: swrast platforms: surfaceless,device API: OpenGL v: 4.5 vendor: mesa v: 24.0.5-1ubuntu1 note: console (EGL sourced) renderer: llvmpipe (LLVM 17.0.6 128 bits) Audio: Device-1: AMD Wrestler HDMI Audio driver: snd_hda_intel Device-2: AMD SBx00 Azalia driver: snd_hda_intel API: ALSA v: k6.8.0-31-generic status: kernel-api Network: Device-1: Realtek RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet driver: r8169 IF: enp1s0 state: up speed: 1000 Mbps duplex: full mac: 00:19:99:cb:57:4f Drives: Local Storage: total: 990.65 GiB used: 7.97 GiB (0.8%) ID-1: /dev/sda vendor: Seagate model: ST1000LM035-1RK172 size: 931.51 GiB type: USB ID-2: /dev/sdb vendor: SanDisk model: Ultra size: 57.28 GiB type: USB ID-3: /dev/sdc vendor: InnoDisk model: DRPS-02GJ30AC1DS-A88 size: 1.85 GiB Partition: ID-1: / size: 56.08 GiB used: 7.31 GiB (13.0%) fs: ext4 dev: /dev/sdb2 Swap: ID-1: swap-1 type: file size: 3.42 GiB used: 0 KiB (0.0%) file: /swap.img Sensors: System Temperatures: cpu: 61.0 C mobo: N/A gpu: radeon temp: 60.0 C Fan Speeds (rpm): N/A Info: Memory: total: 4 GiB note: est. available: 3.42 GiB used: 660.7 MiB (18.8%)

kmike commented 5 months ago

Hey! This definitely looks like an issue with you lxml installation.

barrio commented 5 months ago

lxml is not used directly in my project, but only as a dependency of parsel.

kmike commented 5 months ago

lxml is not used directly in my project, but only as a dependency of parsel.

parsel is a simple pure Python package; it imports lxml, like your code does. Maybe it imports some other lxml modules which cause issue, so 'import lxml' is not enough to reproduce it. Maybe the environment is different when you try 'import lxml' vs 'import parsel'.

I'm not sure how to help with your case. It's very unlikely to be an issue with parsel itself. Clean install of the software could help, not sure. I'm closing this issue for now. If the issue is not resolved for you, you can try https://stackoverflow.com/ (using tag "scrapy" could help)