napari / napari

napari: a fast, interactive, multi-dimensional image viewer for python
https://napari.org
BSD 3-Clause "New" or "Revised" License
2.08k stars 411 forks source link

`ImportError: lxml.html.clean module is now a separate project lxml_html_clean` over test suite (Ubuntu 20.04 and benchmarks) #6798

Open dalthviz opened 2 months ago

dalthviz commented 2 months ago

🧰 Task

Seems like the latest release of lxml (5.2.0) moved a module to be an independant package (lxml.html.clean -> new lxml_html_clean package)? See https://github.com/napari/napari/actions/runs/8510883283/job/23309875720?pr=6794#step:11:174 and https://github.com/napari/napari/actions/runs/8510883283/job/23309397479?pr=6794#step:8:144

psobolewskiPhD commented 2 months ago

I looked into this for a bit more context: https://lxml.de/5.2/changes-5.2.0.html

The lxml.html.clean implementation suffered from several (only if used) security issues in the past and was now extracted into a separate library: https://github.com/fedora-python/lxml_html_clean Projects that use lxml without "lxml.html.clean" will not notice any difference, except that they won't have potentially vulnerable code installed. The module is available as an "extra" setuptools dependency "lxml[html_clean]", so that Projects that need "lxml.html.clean" will need to switch their requirements from "lxml" to "lxml[html_clean]", or install the new library themselves.

Some more discussion: https://bugs.launchpad.net/lxml/+bug/1958539 Googling around, it looks like there is some folks looking into alternatives to html_clean: https://github.com/psf/requests-html/issues/558 nh3 seems promising: https://github.com/messense/nh3

Maybe in the short term we should migrate to the html_clean package, just to silence the CI errors but then we may want to consider whether we should replace it?

psobolewskiPhD commented 2 months ago

I played with nh3, here's a branch using that instead of lxml: https://github.com/psobolewskiPhD/napari/tree/use_nh3_html_sanitizer

The main difference is handling quotes, where nh3.clean doesn't escape quotes.

dalthviz commented 1 month ago

Oh I see, seems like then moving away from lxml_clean and using an alternative could be quite worthy :+1:

adriens commented 1 month ago

Currently facing this error on a planned Notebook too

adriens commented 1 month ago

Hi guys, how should we overcome this image

adriens commented 1 month ago

Currently patched to this :

!pip install --upgrade lxml_html_clean

import geograpy

url = 'https://en.wikipedia.org/wiki/2012_Summer_Olympics_torch_relay'
places = geograpy.get_geoPlace_context(url=url)
print(places)

image

adriens commented 1 month ago

It perfectly worked : I could update 🥷 Neo4J Ninjas duckdb dataset 🦆

psobolewskiPhD commented 1 month ago

It's impossible to figure out what the issue is. We'll need the text of the entire traceback. You can post it between sets of three backticks ` so it's formatted as code.

jamesnq commented 2 weeks ago

photo_2024-05-15_10-23-13

I am currently facing this error, are there any way to fix it? Thank you!

melissawm commented 2 weeks ago

@jamesnq are you using napari? There's no way to tell from your screenshot. Please post the entire traceback as text. Thanks!

jamesnq commented 2 weeks ago

@jamesnq are you using napari? There's no way to tell from your screenshot. Please post the entire traceback as text. Thanks!

Thanks for your reply, I already fixed the error. I install the lxml_html_clean library and run with python 3.10 instead of 3.12 and it works!

CamachoDejay commented 2 weeks ago

I had a similar issue, I am running a python 3.10 env, to solve the problem had to do: conda update napari pip install lxml[html_clean]

ehlui commented 6 days ago

Try to

pip install lxml_html_clean

It might do the trick. This worked for me

jamesnq commented 6 days ago

try to install lxml_html_clean with python 3.10

On Sun, 26 May 2024 at 18:33 Luis @.***> wrote:

Try to

pip install lxml_html_clean

It might do the trick. This worked for me

— Reply to this email directly, view it on GitHub https://github.com/napari/napari/issues/6798#issuecomment-2132186398, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANKRCJ6EQRHAL4SEK5F7F4LZEHB7PAVCNFSM6AAAAABFR42XACVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZSGE4DMMZZHA . You are receiving this because you were mentioned.Message ID: @.***>