olivierthereaux / oldtweets

python script to backup/delete old tweets from your timeline
39 stars 12 forks source link

oldtweets crash #3

Closed karlcow closed 12 years ago

karlcow commented 12 years ago

A little issue. when using

→ cat credentials.txt | ./oldtweets.py

Not sure yet why. It seems it might be related to GetUserTimeline. I wonder if twitter modified the format.

Traceback (most recent call last):
  File "./oldtweets.py", line 131, in <module>
    sys.exit(main())
  File "./oldtweets.py", line 110, in main
    tweets_ids += [status.id for status in api.GetUserTimeline(page=i+1, count=100)]
  File "/Library/Python/2.7/site-packages/twitter.py", line 2680, in GetUserTimeline
    json = self._FetchUrl(url, parameters=parameters)
  File "/Library/Python/2.7/site-packages/twitter.py", line 3794, in _FetchUrl
    response = opener.open(url, encoded_post_data)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 392, in open
    response = self._open(req, data)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 410, in _open
    '_open', req)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 370, in _call_chain
    result = func(*args)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1194, in https_open
    return self.do_open(httplib.HTTPSConnection, req)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1157, in do_open
    r = h.getresponse(buffering=True)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1013, in getresponse
    response.begin()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 402, in begin
    version, status, reason = self._read_status()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 366, in _read_status
    raise BadStatusLine(line)
httplib.BadStatusLine: ''
karlcow commented 12 years ago

It runs for 1m before crashing.

real 1m0.719s user 0m0.510s sys 0m0.123s

karlcow commented 12 years ago

Trying manually to see if there is a specific issue.

    → python
Python 2.7.1 (r271:86832, Jun 16 2011, 16:59:05) 
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import twitter
>>> api = twitter.Api()
>>> api = twitter.Api(consumer_key='*',consumer_secret='*', access_token_key='*',access_token_secret='*')
>>> statuses = api.GetPublicTimeline()
>>> print [s.user.name for s in statuses]
[u'\u305d\u3046\u3081\u3093\u30a4\u30c4\u30ad(CV.\u7d30\u8c37\u4f73\u6b63)', u'Safrina \u30c4', u'Edmond Zemlak', u'\u0627\u0644\u0640\u0640\u0634\u0640\u0640\u0631\u0647\u0640\u0640\u0627\u0646', u'Le-le-leroy....', u'\uff3c\u3053\u3093\u306b\u3061\u306f\u3044\u3048\u30fc\u3044\uff0f', u'Craig Edmunds', u'\u3046\u3043\u3093@\u79d1\u5b66\u7684\u306b\u98f4\u3068\u97ad', u'Saftaryana', u'\u2655Pratiwi', u'tri k', u'Safira Nurul Cahyani', u'\u3053\u3070\u3084\u3057\u3072\u308d\u3084', u'Eline Heijmans', u'Martilll\u2606', u'Kristi D. Kearney', u'\u308a\u3068\u3053', u'Princess Enjay', u'Diniokta DS', u'Lauraa&Emma:)']
>>> statuses = api.GetUserTimeline('karlpro')
>>> print [s.text for s in statuses]
[u'@davidbgk or if you do not allow remote working when possible, it means that you do not value the work that your company can propose ;)', u'"dites des choses simples et d\xe9taill\xe9es sur votre journ\xe9e" "de distinction, de spatialisation et d\u2019historicisation." http://t.co/71CCuHQT', u"@victorbritopro ah merci. Et sinon, pour ma part j'ai moins de plantage\u2026 mais c'est vrai que Flash n'est pas activ\xe9 dans mon navigateur", u"RT @chaals: Yes folks, I'm leaving Opera. No, it has nothing to do with facebook. I'd write more but I have work to finish...", u'What do they read in #NYC Subway http://t.co/uB9M8uY7 #photograph #subway #book', u'"contracts where one party isn\u2019t free to negotiate the terms." \u2014 http://t.co/kymAqgOP', u'Interested by #dnt and #privacy, read also http://t.co/kymAqgOP', u'@victorbritopro Tu sais avec juste ce tweet je ne peux pas y faire grand chose \u263a', u"Today's Aung San Suu Kyi (belated) speech http://t.co/N3lzDilm #society", u'Rencontre de tweets sur la prise de parole  http://t.co/AcBZdGKy  @lespacedunmatin @olivierthereaux', u'@Enwin merci :)', u"@Enwin l'ouverture du bug tracker est un projet en cours. Un peu long \xe0 faire, d\xe9pend de l'infrastructure et des ressources, mais en progr\xe8s", u'@ledahulevogyre je le vois :)', u'@ledahulevogyre corrig\xe9 merci!', u'"They\u2019re negotiating words rather than the issues behind the words." \u2014 http://t.co/KfSMi5mf #ecology #society', u'@SonyOnline Thanks I will check that on Monday.', u"The shell starts a social poster campaign http://t.co/0ny6YtkP It's very effective\u2026 in a way \u263a wonderful.", u'Thanks to @oraclemagazine for connecting with @Opera about http://t.co/YUFsaMNO #otw', u'added screenshots of the Kafka situation with @kobo https://t.co/V7tQs0wn', u'issue with @kobo http://t.co/OPAIfjkB name+string pattern for emails still not solve. :/']

This at least seems to be working.

karlcow commented 12 years ago

ok this time it seems to be working, but we hit a rate limit.

→ cat credentials.txt | ./oldtweets.py >> mytweetsbackupfile.txt 
Traceback (most recent call last):
  File "./oldtweets.py", line 130, in <module>
    sys.exit(main())
  File "./oldtweets.py", line 121, in main
    print "Tweet id: ", tweet_id, " --  Date: ", api.GetStatus(tweet_id).created_at, " || ", api.GetStatus(tweet_id).text.encode('utf-8')
  File "/Library/Python/2.7/site-packages/twitter.py", line 2704, in GetStatus
    data = self._ParseAndCheckTwitter(json)
  File "/Library/Python/2.7/site-packages/twitter.py", line 3668, in _ParseAndCheckTwitter
    self._CheckForTwitterError(data)
  File "/Library/Python/2.7/site-packages/twitter.py", line 3691, in _CheckForTwitterError
    raise TwitterError(data['error'])
twitter.TwitterError: Rate limit exceeded. Clients may not make more than 350 requests per hour.
karlcow commented 12 years ago

3600 s / 350 req =~ 1 req / 11s max

We need to put a timer on this.

karlcow commented 12 years ago

Hmmm putting time.sleep(11) didn't solve it either. It worked for… 43 minutes and stopped.

→ time cat credentials.txt | ./oldtweets.py --delete >> karlpro-tweets-bkp-20120617.txt Traceback (most recent call last): File "./oldtweets.py", line 130, in sys.exit(main()) File "./oldtweets.py", line 121, in main print "Tweet id: ", tweet_id, " -- Date: ", api.GetStatus(tweet_id).created_at, " || ", api.GetStatus(tweet_id).text.encode('utf-8') File "/Library/Python/2.7/site-packages/twitter.py", line 2704, in GetStatus data = self._ParseAndCheckTwitter(json) File "/Library/Python/2.7/site-packages/twitter.py", line 3668, in _ParseAndCheckTwitter self._CheckForTwitterError(data) File "/Library/Python/2.7/site-packages/twitter.py", line 3691, in _CheckForTwitterError raise TwitterError(data['error']) twitter.TwitterError: Rate limit exceeded. Clients may not make more than 350 requests per hour.

real    43m36.847s
user    0m2.724s
sys 0m0.735s

Taille du fichier

→ wc -l  karlpro-tweets-bkp-20120617.txt 
174 karlpro-tweets-bkp-20120617.txt
karlcow commented 12 years ago

Je vais essayer de nouveau cette nuit avec 20s entre chaque requête.

karlcow commented 12 years ago

30s seems to be a good value for avoiding the too many requests.