Open safesploit opened 2 years ago
Hey, I've written this up and it works, but am I missing anything?
Tested and it functions fine, tested a url with a ` character(only thing not covered by htmlspecialchars) and it didn't break it
I've also noticed that html tags are removed from URL titles(if title says "<b>Hi"
it results in "Hi", which kindof is an issue depending on the circumstance, I'd rather it be processed with htmlspecialchars than removed. Anyway,
Line 88 of crawl-manual insert $url = htmlspecialchars(urldecode($url),ENT_QUOTES, "UTF-8");
When crawling the Japanese Wikipedia
ja.wikipedia.org/wiki/メインページ
the following URL is indexedhttps://ja.wikipedia.org/wiki/%E3%83%A1%E3%82%A4%E3%83%B3%E3%83%9A%E3%83%BC%E3%82%B8