Open dalthviz opened 2 months ago
I looked into this for a bit more context: https://lxml.de/5.2/changes-5.2.0.html
The lxml.html.clean implementation suffered from several (only if used) security issues in the past and was now extracted into a separate library: https://github.com/fedora-python/lxml_html_clean Projects that use lxml without "lxml.html.clean" will not notice any difference, except that they won't have potentially vulnerable code installed. The module is available as an "extra" setuptools dependency "lxml[html_clean]", so that Projects that need "lxml.html.clean" will need to switch their requirements from "lxml" to "lxml[html_clean]", or install the new library themselves.
Some more discussion:
https://bugs.launchpad.net/lxml/+bug/1958539
Googling around, it looks like there is some folks looking into alternatives to html_clean:
https://github.com/psf/requests-html/issues/558
nh3
seems promising:
https://github.com/messense/nh3
Maybe in the short term we should migrate to the html_clean package, just to silence the CI errors but then we may want to consider whether we should replace it?
I played with nh3, here's a branch using that instead of lxml: https://github.com/psobolewskiPhD/napari/tree/use_nh3_html_sanitizer
The main difference is handling quotes, where nh3.clean
doesn't escape quotes.
Oh I see, seems like then moving away from lxml_clean
and using an alternative could be quite worthy :+1:
Currently facing this error on a planned Notebook too
Hi guys, how should we overcome this
Currently patched to this :
!pip install --upgrade lxml_html_clean
import geograpy
url = 'https://en.wikipedia.org/wiki/2012_Summer_Olympics_torch_relay'
places = geograpy.get_geoPlace_context(url=url)
print(places)
It perfectly worked : I could update 🥷 Neo4J Ninjas duckdb dataset 🦆
It's impossible to figure out what the issue is. We'll need the text of the entire traceback. You can post it between sets of three backticks ` so it's formatted as code.
I am currently facing this error, are there any way to fix it? Thank you!
@jamesnq are you using napari? There's no way to tell from your screenshot. Please post the entire traceback as text. Thanks!
@jamesnq are you using napari? There's no way to tell from your screenshot. Please post the entire traceback as text. Thanks!
Thanks for your reply, I already fixed the error. I install the lxml_html_clean library and run with python 3.10 instead of 3.12 and it works!
I had a similar issue, I am running a python 3.10 env, to solve the problem had to do:
conda update napari
pip install lxml[html_clean]
Try to
pip install lxml_html_clean
It might do the trick. This worked for me
try to install lxml_html_clean with python 3.10
On Sun, 26 May 2024 at 18:33 Luis @.***> wrote:
Try to
pip install lxml_html_clean
It might do the trick. This worked for me
— Reply to this email directly, view it on GitHub https://github.com/napari/napari/issues/6798#issuecomment-2132186398, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANKRCJ6EQRHAL4SEK5F7F4LZEHB7PAVCNFSM6AAAAABFR42XACVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZSGE4DMMZZHA . You are receiving this because you were mentioned.Message ID: @.***>
🧰 Task
Seems like the latest release of
lxml
(5.2.0) moved a module to be an independant package (lxml.html.clean
-> newlxml_html_clean
package)? See https://github.com/napari/napari/actions/runs/8510883283/job/23309875720?pr=6794#step:11:174 and https://github.com/napari/napari/actions/runs/8510883283/job/23309397479?pr=6794#step:8:144