What steps will reproduce the problem?
1.Add the seed URL:
http://money.cnn.com/2013/06/26/investing/bond-outflows/index.html?utm_source=fe
edburner&utm_medium=feed&utm_campaign=Feed%3A+rss%2Fmoney_latest+%28Latest+News%
29
2. Run as normal
What is the expected output? What do you see instead?
I expect that within the visit method, the expression
page.getWebURL().getURL()
would have the same value as the seed URL (there are no redirected links),
however, instead the value of this is:
http://money.cnn.com/2013/06/26/investing/bond-outflows/index.html?utm_campaign=
Feed%3A%2Brss%2Fmoney_latest%2B%28Latest%2BNews%29&utm_medium=feed&utm_source=fe
edburner
What version of the product are you using?
3.5
Please provide any additional information below.
I am managing the seed urls manually and do not wish to add them to the seed
again if they are already in my database. I wish to be able to determine either:
a) what the original URL was when I am within the visit method of MyCrawler;
b) the final URL as it will appear within the visit method of MyCrawler
Thanks
Original issue reported on code.google.com by richard....@gmail.com on 26 Jun 2013 at 7:29
Original issue reported on code.google.com by
richard....@gmail.com
on 26 Jun 2013 at 7:29