mholt / timeliner

All your digital life on a single timeline, stored locally -- DEPRECATED, SEE TIMELINIZE (link below)
https://timelinize.com
GNU Affero General Public License v3.0
3.56k stars 116 forks source link

Twitter: aborts if media download yields "403 Forbidden", e.g. removed by copyright claim #52

Open joonas-fi opened 4 years ago

joonas-fi commented 4 years ago

Here's the Tweet: https://twitter.com/janl/status/1113015555064201216

Error message:

2019/11/30 18:04:02 [ERROR][twitter/joonas_fi] Getting latest: getting items from service: processing tweet from API: processing tweet 1113180316510957568: making item from tweet that this tweet (1113180316510957568) is in reply to (1113015555064201216): making item from tweet that this tweet (1113015555064201216) embeds (1112473455650172929): media resource returned HTTP status 403 Forbidden: https://pbs.twimg.com/ext_tw_video_thumb/1112471832232259585/pu/img/ywWGTl09hsnLnMOY.jpg

That image URL redirects (when used with browser - different when API use?) to this DMCA warning.

Timeliner cannot cope with this, and trying to re-run Timeliner always gets me this and cannot continue.

mholt commented 4 years ago

Ah, oops. Not something I anticipated or encountered. How do you think we should handle this?

joonas-fi commented 4 years ago

I dunno, this is a pickle. The obvious error is not being able to continue after 403. My data retrieval process just aborts.

But, what should we do about it? Sure, continue after the error. But, personally, I am not fan of losing any information. In this case the information is:

there once was an attachment, but we didn't manage to fetch it in time because it was later taken down because of a DMCA complaint

I'd prefer this to be stored in the data model. I haven't researched Timeliner's data model, something like attachment: {id: '987654321', permanentFetchFailureReason: '403 not found - Twitter or the author removed it?'} ?

Things to think about:

Ruthalas commented 4 years ago

...if we're doing "full refresh" and we have a permanentFetchFailureReason, should we still re-try fetching it?

I concur with your conclusion that rechecking is inexpensive, and so worth trying.

mholt commented 4 years ago

403 is usually permanent, or something has to be changed on the server to remove that error.

Perhaps Timeliner should simply continue to the next item after seeing a 403. Log the 403, but continue on, since there's nothing we can do about it. This should probably be the behavior no matter what mode it's running in.

But I also agree that simply trying once or twice more before continuing on wouldn't be a bad idea, in case it was a fluke.