tumblr / docs

Tumblr's public platform documentation.
Apache License 2.0
109 stars 27 forks source link

Fetch multiple posts by ID in a single request? #135

Closed blackjackkent closed 3 months ago

blackjackkent commented 3 months ago

Hello!

For a number of years now, I have run a third-party application for writers on Tumblr which makes use of the Tumblr API to fetch numerous different posts per user. One of the biggest challenges in running this application has been the fact that the API only has an endpoint to fetch individual posts by a single ID, which leads to rate limiting challenges, especially as some of my users have started to accumulate quite a lot of posts they're interested in!

I was wondering if there is any possibility of the API being updated to allow passing multiple post IDs to retrieve multiple posts from a blog in a single request? This would be a really fantastic improvement and make a lot of things easier.

Thanks much for all you do!

nightpool commented 3 months ago

Rate limits, generally, serve as a measure of complexity that you impose on the backend server. E.g., fetching a single post incurs X number of database requests. My guess is that the vast majority of the complexity associated with processing an individual request for Tumblr (especially when using HTTP2 pipelining or HTTP1.1 keepalive) is in processing the post itself, especially with respect to fetching the other reblogs in the reblog tree and encoding the HTML content of the post / transforming it between NPF / HTML / etc., and that, in general, fetching multiple posts wouldn't necessarily gain you any improvement from a rate limit perspective, since making 1 request to fetch 100 posts and making 100 requests to fetch 1 post each are going to be ~roughly the same amount of work for Tumblr's backend servers.

On Sat, Aug 3, 2024 at 1:02 PM Rosalind W. @.***> wrote:

Hello!

For a number of years now, I have run a third-party application for writers on Tumblr which makes use of the Tumblr API to fetch numerous different posts per user. One of the biggest challenges in running this application has been the fact that the API only has an endpoint to fetch individual posts by a single ID, which leads to rate limiting challenges, especially as some of my users have started to accumulate quite a lot of posts they're interested in!

I was wondering if there is any possibility of the API being updated to allow passing multiple post IDs to retrieve multiple posts from a blog in a single request? This would be a really fantastic improvement and make a lot of things easier.

Thanks much for all you do!

— Reply to this email directly, view it on GitHub https://github.com/tumblr/docs/issues/135, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABZCV4PAYJGLZGVGNN3L7DZPUELPAVCNFSM6AAAAABL6BZX4CVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQ2DMNJQGMZDCNQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

cyle commented 3 months ago

Unfortunately @nightpool 's comment is correct -- our rate limits are primarily to prevent infrastructure headaches, but also to prevent malicious behavior. So fetching many post by specific IDs in one request would be very "expensive" for us, at least in the ways we haven't optimized for. We've optimized Tumblr to be able to fetch many posts across many blogs in a reverse-chronological list, but not to fetch a large number of arbitrary posts by specific IDs.

I'm curious, though, how often you're hitting the rate limits doing what you're doing, and whether you could adjust to absorb the rate limit responses and wait accordingly. I'm guessing this is for the RPThreadTracker project?

blackjackkent commented 3 months ago

That makes total sense - thank you both for the prompt/cogent responses. :)

It is for RPThreadTracker, yeah. And I don't really have exact information on how often this is coming up (should probably add some more focused logging to get a more specific idea). Right now I just have anecdotal reports of people who have been using my app for a long time and have a lot of threads to fetch starting to run into the issue of my existing retry logic not sufficiently handling their fetching needs. (Downside of having run an app for ten years. O.O )

There's definitely further work I could do on my end; I was basically just curious if there was any chance of this being added as a feature of the API, which would simplify my code! XD But definitely makes sense why it hasn't been done this way. Thanks again for the response. :)

cyle commented 3 months ago

Sure thing! If you do debug anything more specific, and it seems like you're getting 429s more often than you should, please feel free to reopen this or file a Support ticket with more specifics and we can dig into it more directly. In the worst case, if some users are trying to track thousands of posts "simultaneously", maybe there's some better logic we could suggest, like checking the cache-control headers to see if the post content has even changed, before fetching the full post. I'm not sure if we have that available on a per-post basis but it'd be easier for us to implement than multiple-post-fetching.