khuongduyit / crawler4j

Automatically exported from code.google.com/p/crawler4j
0 stars 0 forks source link

Seed URL and Final URL differ (No Redirects) #226

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1.Add the seed URL:
http://money.cnn.com/2013/06/26/investing/bond-outflows/index.html?utm_source=fe
edburner&utm_medium=feed&utm_campaign=Feed%3A+rss%2Fmoney_latest+%28Latest+News%
29
2. Run as normal

What is the expected output? What do you see instead?

I expect that within the visit method, the expression 
 page.getWebURL().getURL()
would have the same value as the seed URL (there are no redirected links), 
however, instead the value of this is:
http://money.cnn.com/2013/06/26/investing/bond-outflows/index.html?utm_campaign=
Feed%3A%2Brss%2Fmoney_latest%2B%28Latest%2BNews%29&utm_medium=feed&utm_source=fe
edburner

What version of the product are you using?
3.5

Please provide any additional information below.
I am managing the seed urls manually and do not wish to add them to the seed 
again if they are already in my database. I wish to be able to determine either:
a) what the original URL was when I am within the visit method of MyCrawler;
b) the final URL as it will appear within the visit method of MyCrawler

Thanks

Original issue reported on code.google.com by richard....@gmail.com on 26 Jun 2013 at 7:29

GoogleCodeExporter commented 9 years ago

Original comment by avrah...@gmail.com on 18 Aug 2014 at 3:41