incorrect URL(s) extracted and long working time for a big relatively text

htrinter / Open-Multiple-URLs

Browser extension for opening lists of URLs built on top of WebExtension with cross-browser support

GNU General Public License v3.0

252 stars 57 forks source link

incorrect URL(s) extracted and long working time for a big relatively text #3

Open roger125 opened 5 years ago

roger125 commented 5 years ago

HI Team,

i've been using this extension quite around years, it's basically quite awesome,

so just a small bug suggestion: 1st: for the button "Extract URLs from text", it will extract "text-align:center" among the other normal results, this is the 1st time that i see it out of the expectation, so i think it's better to be optimized.

2nd: it will stuck or taking too long time to wait to get the results for a text with 2000 lines around(web page's source code), so not sure if it's can be optimized also.

Thanks for your consideration & effort.

htrinter commented 5 years ago

Hi, thanks for your feedback! Yes, the regex currently in use has some limitations. So does every regex for matching URLs in text, I'm afraid. Although a better result can probably be achieved. Playing around with a few regular expressions and libraries in on the todo list, although it may take a while.

roger125 commented 5 years ago

another scenario is:

234124 https://google.com/.jpg},{"photo 3241324

so currently it will extract out as "https://google.com/.jpg},{"photo" while the "https://google.com/.jpg" is expected.

yup, i agree what you said, the regular expressions and libraries can be played arounded for a bit to make it better.

while this git doesn't change/updated frequently as i can see, it's pleased to see a reply from the author, thanks for the reply, have a nice day :-)

GhbSmwc commented 5 years ago

URLs without the “www.” will also get filtered out:

Enter:

https://www.example.com
www.google.com

Comes out:

https://www.example.com