URL containing http:// are parsed as 2 urls

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. Get the following url on the screen:
http://web.archive.org/web/20031118045547/http://lists.netsys.com/pipermail/full
-disclosure/2003-August/00
2. Try to use URL scan.

What is the expected output? What do you see instead?
It's one address, but it's parsed as 2 separate (in fact, the second
http:// is attached to the first one )

What version of the product are you using (you can see this by using Menu
-> About in the Host List)?
1.6.2 r480

Original issue reported on code.google.com by grizza...@gmail.com on 21 Apr 2010 at 4:09

GoogleCodeExporter commented 9 years ago

Hm, if anyone has a better regex for the URL scanning, they're welcome to 
submit it.

Original comment by kenny@the-b.org on 23 May 2010 at 5:02

Changed state: Accepted

GoogleCodeExporter commented 9 years ago

The only 100% right regex for matching URLs is the one compiled from BNF and 
available, for example, here:

http://web.archive.org/web/20070302134659/foad.org/~abigail/Perl/url3.regex

However I guess it may be an overkill.

Original comment by gjedeer@gmail.com on 9 Sep 2010 at 2:46

ikariiin / connectbot

URL containing http:// are parsed as 2 urls #300