snarfed / bridgy

📣 Connects your web site to social media. Likes, retweets, mentions, cross-posting, and more...
https://brid.gy
Creative Commons Zero v1.0 Universal
739 stars 52 forks source link

support cookies when making HTTP requests #421

Closed snarfed closed 2 years ago

snarfed commented 9 years ago

...specifically, when discovering webmention endpoints and crawling sites during OPD.

we occasionally see original post links that require cookies. the usual symptom is a redirect loop: we get redirected to a URL that sets the cookie, but we don't pass it back in our next fetch, so we then get the same redirect. example log:

Webmention from https://brid-gy.appspot.com/like/twitter/edtechdev/617723808712105984/1058806590 to http://www.tandfonline.com/doi/full/10.1080/10494820.2015.1060504#.VZlRpHUVhBc
Sending...
Starting new HTTP connection (1): www.tandfonline.com
"GET /doi/full/10.1080/10494820.2015.1060504 HTTP/1.1" 302 None
"GET /doi/full/10.1080/10494820.2015.1060504?cookieSet=1 HTTP/1.1" 302 None
"GET /doi/full/10.1080/10494820.2015.1060504 HTTP/1.1" 302 None
"GET /doi/full/10.1080/10494820.2015.1060504?cookieSet=1 HTTP/1.1" 302 None
"GET /doi/full/10.1080/10494820.2015.1060504 HTTP/1.1" 302 None
"GET /doi/full/10.1080/10494820.2015.1060504?cookieSet=1 HTTP/1.1" 302 None
...
Traceback (most recent call last):
File "/base/data/home/apps/s~brid-gy/3.385505439646824075/tasks.py", line 478, in do_send_webmentions
  if not mention.send(timeout=999, headers=util.USER_AGENT_HEADER):
File "/base/data/home/apps/s~brid-gy/3.385505439646824075/local/lib/python2.7/site-packages/webmentiontools/send.py", line 24, in send
  self._discoverEndpoint()
File "/base/data/home/apps/s~brid-gy/3.385505439646824075/local/lib/python2.7/site-packages/webmentiontools/send.py", line 30, in _discoverEndpoint
  r = requests.get(self.target_url, verify=False, **self.requests_kwargs)
File "/base/data/home/apps/s~brid-gy/3.385505439646824075/local/lib/python2.7/site-packages/requests/api.py", line 55, in get
  return request('get', url, **kwargs)
...
  raise TooManyRedirects('Exceeded %s redirects.' % self.max_redirects)
TooManyRedirects: Exceeded 30 redirects.

looks like the easy fix is to add a requests Session object.

snarfed commented 2 years ago

Not a big problem, haven't seen it in a while now, no one has asked for it in many years. Tentatively closing.