jasonLaster / abuses

4 stars 3 forks source link

Pipeline: Extract embedded media from tweets, starting with images #34

Open ramezn opened 4 years ago

ramezn commented 4 years ago

We should process the spreadsheet or AirTable of incidents. For items that are tweets, we should look at the tweet (and tweets that it references) and extract embedded media, copying those media into a location (Google Drive?) and capturing the URLs to the media and storing those URLs in AirTable.

To start with, let's do this for images. There are numerous incidents where we have a tweet that contains images, but video. Today we aren't rendering these on the site. Let's extract the images and put them some place we can serve them from, in preparation for the site UX being able to display this.

Longer term, we need a pipeline to do this with videos as well, as currently Jason Miller is manually extracting videos from tweets and copying them into YouTube and Google Drive.

It's been suggested that this is a good task for @brwinkle.

brwinkle commented 4 years ago

So for this, do we have opinions about where we should host these images? Google Drive is mentioned above, but we could also do S3 or just like Imgur or something. Also, I'm presuming the source I should be processing is the AirTable. Is that right?

Also, @ramezn, mind removing my name from the issue there? 😄

dmnd commented 4 years ago

S3 seems like the path of least resistance to me

brwinkle commented 4 years ago

Just waiting for Twitter give me developer account access

dmnd commented 4 years ago

Just waiting for Twitter give me developer account access

@brwinkle I got this in my email today (for an unrelated project)

image

did yours come though?

brwinkle commented 4 years ago

I never got an email, but I checked today and I seemed to be approved, so I got cracking on this. I think I should have a PR up tomorrow.