zhegexiaohuozi / SeimiCrawler

一个简单、敏捷、分布式的支持SpringBoot的Java爬虫框架;An agile, distributed crawler framework.
http://seimicrawler.org
Apache License 2.0
1.98k stars 679 forks source link

SeimiDownloader 处理 uri 存在问题 #64

Open rexleimo opened 3 years ago

rexleimo commented 3 years ago

当匹配到这样的url的时候 //fundact.eastmoney.com/banner/Hot_Em.html?spm=xlb

public Response metaRefresh(String nextUrl) throws Exception { if (!nextUrl.startsWith("http")) { String prefix = this.getRealUrl(this.httpContext); nextUrl = prefix + nextUrl; }

    this.logger.info("Seimi refresh url to={} from={}", nextUrl, this.currentReqBuilder.getUri());
    this.currentReqBuilder.setUri(nextUrl);
    this.httpResponse = this.hc.execute(this.currentReqBuilder.build(), this.httpContext);
    return this.renderResponse(this.httpResponse, this.currentRequest, this.httpContext);
}

这里的拼接就有问题了

zhegexiaohuozi commented 3 years ago

欢迎提交经过充分测试的PR