Closed osjerick closed 2 years ago
Same problem here with Python 3.9.2
This is strange, I cannot reproduce either in my local Linux Mint environment:
$ scrapy version -v
Scrapy : 2.5.0
lxml : 4.6.3.0
libxml2 : 2.9.10
cssselect : 1.1.0
parsel : 1.6.0
w3lib : 1.22.0
Twisted : 21.7.0
Python : 3.9.6 (default, Jul 5 2021, 11:47:27) - [GCC 7.5.0]
pyOpenSSL : 20.0.1 (OpenSSL 1.1.1l 24 Aug 2021)
cryptography : 3.4.8
Platform : Linux-4.15.0-147-generic-x86_64-with-glibc2.27
nor in a docker container with Python 3.8.10:
# scrapy version -v
Scrapy : 2.5.0
lxml : 4.6.3.0
libxml2 : 2.9.10
cssselect : 1.1.0
parsel : 1.6.0
w3lib : 1.22.0
Twisted : 21.7.0
Python : 3.8.10 (default, Jun 23 2021, 15:19:53) - [GCC 8.3.0]
pyOpenSSL : 20.0.1 (OpenSSL 1.1.1l 24 Aug 2021)
cryptography : 3.4.8
Platform : Linux-4.15.0-147-generic-x86_64-with-glibc2.2.5
nor in my MBP:
$ scrapy version -v
Scrapy : 2.5.0
lxml : 4.6.3.0
libxml2 : 2.9.10
cssselect : 1.1.0
parsel : 1.6.0
w3lib : 1.22.0
Twisted : 21.2.0
Python : 3.8.6 (v3.8.6:db455296be, Sep 23 2020, 13:31:39) - [Clang 6.0 (clang-600.0.57)]
pyOpenSSL : 20.0.1 (OpenSSL 1.1.1k 25 Mar 2021)
cryptography : 3.4.7
Platform : macOS-10.15.7-x86_64-i386-64bit
Downgrading to lxml==4.5.2
does not change the output.
Could you provide more information to reproduce?
$ scrapy version -v
Scrapy : 2.5.0
lxml : 4.6.3.0
libxml2 : 2.9.12
cssselect : 1.1.0
parsel : 1.6.0
w3lib : 1.22.0
Twisted : 21.7.0
Python : 3.9.2 (default, Feb 28 2021, 17:03:44) - [GCC 10.2.1 20210110]
pyOpenSSL : 20.0.1 (OpenSSL 1.1.1k 25 Mar 2021)
cryptography : 3.3.2
Platform : Linux-5.10.0-8-amd64-x86_64-with-glibc2.31
I am quite sure the problem occured after i upgraded to the new debian stable bullseye
Could be linked to that : https://gitlab.gnome.org/GNOME/libxml2/-/issues/255 Edit : definitly linked to that issue, downgrading to libxml 2.9.10 fixes the problem
Thanks! that fixed it for me. Conda 4.10.3 erroneously selected libxml2 v2.9.12 Locking my environment to 2.9.10 (as lxml has done) solved the issue.
You can see in the May 18, 2021 series of commits for lxml's Makefile, that 2.9.12 was tested and then promptly reverted.
I can confirm both the issue and the fix (i.e. downgrading libxml2 to 2.9.10). For me the issue was caused either by upgrading ipykernel from 5.3.4 to 6.2.0 or installing eli5 under conda in a WSL2 setting (history file cot corrupt so not sure which).
AFAICS this is fixed in newer libxml2 so I don't think this should stay open.
I've installed Scrapy into a new environment recently and now, when trying to get the HTML source of a node, the selector returns the node and the subsequent code in the whole source.
Note: I installed
parsel
withScrapy
into conda environments using theconda-forge
channel.Current behavior:
The env is composed by:
Previous behavior:
This env is composed by:
Scrapy 2.5.0
Versions: lxml 4.5.2.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 20.3.0, Python 3.8.5
Previous behavior is preferred.
Is this an issue or is it the standard way it should behave now?
I tried downgrading
lxml
to 4.5.2 sinceparsel
dependencies are just a few and onlylxml
is not matching between these two environments, but nothing changed.