gratipay / gratipay.com

Here lieth a pioneer in open source sustainability. RIP
https://gratipay.news/the-end-cbfba8f50981
MIT License
1.12k stars 308 forks source link

Twitter user avatars are broken #1936

Closed seanlinsley closed 10 years ago

seanlinsley commented 10 years ago

screen shot 2014-01-22 at 4 48 16 pm

This is currently being discussed in IRC.

chadwhitacre commented 10 years ago

Appears to be a Twitter issue: https://dev.twitter.com/discussions/25385.

chadwhitacre commented 10 years ago

Here's a script to convert si0 to pbs if we need to go that route (take out the rollback):

BEGIN;

    UPDATE elsewhere SET user_info=user_info || (
        'profile_image_url_https'=>(
            'https://pbs.' || substring(user_info->'profile_image_url_https' from 11)
        )
    ) WHERE platform='twitter';

    SELECT user_info->'profile_image_url_https' FROM elsewhere WHERE platform='twitter';

    ROLLBACK;

END;
chadwhitacre commented 10 years ago

Still borken.

seanlinsley commented 10 years ago

@whit537 do all URLs in the database use si0 currently?

clone1018 commented 10 years ago

@seanlinsley https://botbot.me/freenode/gittip/msg/10088765/

seanlinsley commented 10 years ago

Is there any correlation between the 4431 users with a pbs URL? Are they new?

Are there any other subdomains in use?

galuszkak commented 10 years ago

@seanlinsley I really don't know. But I know that there is more subdomains on twtimg (like a0).

I sugest to change @whit537 SQL and make a script that do this. (pseudo code)

#check image working
if get(profile_image_url_https).code in [403, 404]:
     #check if changed link works
     if get(updated_to_pbs_profile_image_url_https).code in [200, 301]
            #if works then update URL
            update_url()
chadwhitacre commented 10 years ago

Some of these are coming back, others aren't. MaxCDN and Bountysource are back, UkuleleRod isn't. Could be because the first two have logged in since this started.

chadwhitacre commented 10 years ago

Confirmed: MaxCDN and Bountysource are now on pbs, while UkuleleRod is still on si0. I checked a backup from last week and all three were on si0 last week.

chadwhitacre commented 10 years ago

What's the harm in switching everyone who is si0 to pbs, per https://github.com/gittip/www.gittip.com/issues/1936#issuecomment-33078357? I suppose we're assuming that all si0s are busted and all pbss are good. We could/should verify that assumption before pulling the trigger.

chadwhitacre commented 10 years ago
#!/usr/bin/env python                                                                                       
import requests, sys                                                                                        

for i, line in enumerate(open('twimg.csv')):                                                                
    url = line.strip()                                                                                      
    response = requests.get(url)                                                                            
    if response.status_code != 200:                                                                         
        print response.status_code, url                                                                     
    sys.stdout.flush()

I'm running that script against 18,960 URLs. Will report back ...

clone1018 commented 10 years ago

Just don't do it from production :D

chadwhitacre commented 10 years ago

:-)

[gittip] $ grep "403 " twimg.log | wc -l
   13123
[gittip] $ grep "404 " twimg.log | wc -l
     534
[gittip] $ grep "si0" twimg.log | wc -l
   13123
[gittip] $ grep "pbs" twimg.log | wc -l
     534
[gittip] $ wc -l twimg.log
   13657 twimg.log
[gittip] $ echo 13123 534 + p | dc
13657
[gittip] $

The script died before reaching 18,960, not sure why. Also, why are the pbs ones 404 instead of 200?

chadwhitacre commented 10 years ago

Blech. This sucks.

chadwhitacre commented 10 years ago

The right ways to fix this are:

Neither of those is trivial.

chadwhitacre commented 10 years ago

There's a script in #1989 to fix this as a one-off. Spinning up a DO VPS to run it (using the payday image) ...

chadwhitacre commented 10 years ago

The script died mysteriously (forgot to redirect stderr :/ ) after processing 4036 accounts. Before rerunning it's probably worth rewriting to use users/lookup (100 at a time) instead of users/show (one at a time), per https://github.com/gittip/www.gittip.com/pull/1989#issuecomment-34416629.

chadwhitacre commented 10 years ago

Rewrote the script to use lookup and rerunning it now. It still has a 5-second sleep between hits. If we were under 18,000 we could fit inside one 15 minute window, but we're at ~19,000.

chadwhitacre commented 10 years ago

This should be done in 15-20 minutes.

chadwhitacre commented 10 years ago

Done! :dancer:

chadwhitacre commented 10 years ago

screen shot 2014-02-07 at 1 35 37 pm