opendatateam / udata

Customizable and skinnable social platform dedicated to open data.
http://udata.readthedocs.org
GNU Affero General Public License v3.0
239 stars 87 forks source link

Bleach domain parsing in linkify faulty for some emails #2453

Open quaxsze opened 4 years ago

quaxsze commented 4 years ago

The Bleach library used to sanatize markdown seems to parse and detect parts of domains name as link themselves.

Example: https://www.data.gouv.fr/fr/datasets/lignes-souterraines-du-reseau-rte-sur-le-territoire-de-la-mel/

Within the domain "rte-france.com", "-france.com" is seen as a valid link.

This will be fixed when https://github.com/mozilla/bleach/issues/60 is fixed, putting this on hold for now.

Reproduce inudata shell:

>>> html = '<p>Pour tous renseignements complémentaires sur ce jeu de données, écrivez à : rte-inspire-infos@rte-france.com</p>\n'
>>> cleaner = bleach.Cleaner(
...     tags=current_app.config['MD_ALLOWED_TAGS'],
...     attributes=current_app.config['MD_ALLOWED_ATTRIBUTES'],
...     styles=current_app.config['MD_ALLOWED_STYLES'],
...     protocols=current_app.config['MD_ALLOWED_PROTOCOLS'],
...     strip_comments=False,
...     filters=[partial(LinkifyFilter, skip_tags=['pre'], parse_email=False,
...                         callbacks=callbacks)]
... )
>>> cleaner.clean(html)
'<p>Pour tous renseignements complémentaires sur ce jeu de données, écrivez à : rte-inspire-infos@rte<a href="http://-france.com">-france.com</a></p>\n'
JulienParis commented 4 years ago

so the first thing I've noticed is the current version of Beach we use is the 3.1.0, we could try to upgrade to the current 3.1.5 I guess

JulienParis commented 4 years ago

I think they tried to fix this behaviour in Bleach there : https://github.com/sedrubal/bleach/commit/b6537008a61bee98a03eda309e6d26f77af34f9b

JulienParis commented 4 years ago

some issues for later readings :

quaxsze commented 4 years ago

Seems relevant indeed :) . Can you try upgrading it localy?

JulienParis commented 4 years ago

Seems relevant indeed :) . Can you try upgrading it localy?

I'm testing it locally as we speak ... Pedagogically speaking sounds fun, it could help understanding bit better the docker process

JulienParis commented 4 years ago

I made a test page to check udata behaviour on various ways to write emails adresses...

I also referenced some new issues I discovered while debugging that topic, all that seemed to me somehow related to the way udata is bleaching the markdown contents (md -> html) : #2496 #2497

ThibaudDauce commented 6 months ago

Link with https://github.com/opendatateam/udata/issues/2498