Open GoogleCodeExporter opened 8 years ago
source = links.matcher(source).replaceAll(""); 样例:http://news.itxinwen.com/2013/0802/515691.shtml 单是这一步 将耗时90s+ 建议:可以直接通过source = source.replaceAll("<[^>]+>", ""); 移除所有Tag?
Original issue reported on code.google.com by ywq1...@gmail.com on 2 Aug 2013 at 8:01
ywq1...@gmail.com
private static Pattern links = Pattern.compile("<[^>]+>.*?</[aA]>"); 考虑到<a>contents<a>这样更好些 唯一的缺陷是 如果正文有带有超链接的文字段也将被删除了
Original comment by ywq1...@gmail.com on 2 Aug 2013 at 9:57
Original issue reported on code.google.com by
ywq1...@gmail.com
on 2 Aug 2013 at 8:01