sparkfish / shabby-pages

ShabbyPages is a state-of-the-art corpus of born-digital document images with both ground truth and distorted versions appropriate for use in training models to reverse distortions and recover to original denoised documents.
MIT License
48 stars 6 forks source link

Error 431 when trying to upload media to Twitter #9

Closed proofconstruction closed 2 years ago

proofconstruction commented 2 years ago

The tweet.py script currently doesn't work, failing with a 431 Request Header Fields Too Large response status code when trying to upload media.

# Upload the file
with open(filename, "rb") as upload_file:
    data = upload_file.read()
    resource_url ="https://upload.twitter.com/1.1/media/upload.json"

    upload_image = {
        "media": data,
        "media_category": "tweet_image"
    }

    image_headers = {
        "Authorization": "Bearer {}".format(bearer_token)
    }

    try:
        media_id=requests.post(resource_url,headers=image_headers,params=upload_image)
    except Exception as e:
        print(f"Failed: {e}")
        sys.exit()

This might be a problem with the image size being too big (some Augraphy outputs are quite large and could overrun Twitter's 5MB limit on image size), which should be checked.

The desired response to that request should be a media ID that uniquely refers to the piece of media uploaded. This is added to the payload later when posting a tweet, attaching the media with that ID to that tweet:

# Build the tweet and send it
tweet = {"status": message, "media_ids": media_id}
post_url = "https://api.twitter.com/1.1/statuses/update.json"
post_resp = requests.post(post_url,params=tweet,headers=image_headers)

Note: Twitter has two API versions, but currently we're just using the v1.1 API (or trying to). Another option which the Tweepy library uses is posting the image to the v1.1 API (images currently cannot always be posted to the v2 API) and then posting the tweet to the v2 API. I explored this previously but ran into other issues.

jboarman commented 2 years ago

@AlphaPL resolved the underlying issue in tweet.py are part PR #11. Now @shotor is going to try to hook this up into the daily build pipeline.

shotor commented 2 years ago

@jboarman @AlphaPL I implemented the Github action. I get a 200 on the auth request, but the upload is failing:

Failed: [Errno Expecting value] : 0

https://github.com/sparkfish/shabby-pages/runs/6406624856?check_suite_focus=true#step:6:135 https://github.com/sparkfish/shabby-pages/blob/dev/tweet.py#L60

What I did:

Is there a step I'm missing?

In case I'm not around I also added a manual trigger for the daily build. Github Actions -> Tag Action -> Daily Tag -> Run workflow ->

jboarman commented 2 years ago

Thanks @shotor!

Is there a simple way to let that script error bubble up to cause the overall build to fail (so we don't silently ignore this once the issue gets fixed but then recurs again in the future)?

jboarman commented 2 years ago

After making a few changes to the tweet.py script, this is now running successfully (and flags the build as a failure when it does not).