jjlee / mechanize

Stateful programmatic web browsing in Python, after Andy Lester's Perl module WWW::Mechanize .
http://wwwsearch.sourceforge.net/mechanize/
618 stars 121 forks source link

Infinite loop on self-refreshing pages #28

Closed kmowery closed 14 years ago

kmowery commented 14 years ago

Issue: Calling br.open(url) enters an infinite refresh loop if the page has a refresh header pointing to itself.

Reasons:

In my case, I don't care about refresh headers, so I simply changed the default arguments at _useragent.py:107.

Possible Solutions:

Thoughts?

(Thanks for mechanize, btw, it's a fantastic piece of software!)

jjlee commented 14 years ago

Do you still think there's a bug?

kmowery commented 14 years ago

No, I completely missed the fact that set_handle_refresh was available from Browser objects.

Upon further testing, the max redirection count does seem to apply.

So no, there isn't a bug. It just felt like there was, as the page I was viewing had a refresh time of 1200 and the open call didn't return for several hours. I suppose it's just a caveat to be aware of: the default settings of mechanize can cause arbitrarily long delays when viewing a page with certain attributes.