medialab / hyphe

Websites crawler with built-in exploration and control web interface
http://hyphe.medialab.sciences-po.fr/demo/
GNU Affero General Public License v3.0
328 stars 59 forks source link

[IMPORT] bug in URL detection #300

Open paulgirard opened 5 years ago

paulgirard commented 5 years ago

image

the last " is kept from a href="url" parsing

boogheta commented 4 years ago

We should maybe reuse the proper url regexp from ural https://github.com/medialab/ural/blob/master/ural/patterns.py