tumblr / docs

Tumblr's public platform documentation.
Apache License 2.0
109 stars 27 forks source link

How to paginate /tagged responses? #131

Open moon6969 opened 5 months ago

moon6969 commented 5 months ago

Please how do I paginate the results returned from api.tumblr.com/v2/tagged?

v2/tagged endpoint response is limited to 20 posts.

v2/tagged endpoint does not support 'offset' parameter and does not appear to support a sort parameter.

The responses returned from v2/tagged endpoint appear to be in 'id' order rather than 'timestamp' order, so the 'before' parameter is not useful.

None of the results returned (for the tag 'PhotoToaster' I am testing with) contain 'featured_timestamp' field.

Thanks.

marcustyphoon commented 5 months ago

I think you're supposed to use timestamp unless featured_timestamp exists (so, before=${lastPost.featured_timestamp || lastPost.timestamp}).

id order and timestamp order are effectively the same in this context, right? Only original posts show up in /tagged, so a higher post id should correspond with a higher post timestamp.

Edit: Yep, seems to work fine on both "PhotoToaster" and "gif":

(async () => {
  let lastTimestamp;

  let url = `https://api.tumblr.com/v2/tagged?tag=PhotoToaster&api_key=[removed]`;

  for (let i = 0; i < 3; i++) {
    const response = await fetch(url).then((response) => response.json());

    console.log(response.response.map(({ id, timestamp, post_url }) => `${id} ${timestamp} ${post_url}`));

    const lastPost = response.response.at(-1);
    lastTimestamp = lastPost.featured_timestamp || lastPost.timestamp;
    url = `https://api.tumblr.com/v2/tagged?tag=PhotoToaster&before=${lastTimestamp}&api_key=[removed]`;
  }
})();
nightpool commented 5 months ago

The docs say the following

before: The timestamp of when you'd like to see posts before. If the Tag is a "featured" tag, use the "featured_timestamp" on the post object for pagination.

moon6969 commented 5 months ago

I am getting this for the first 10 responses for "PhotoToaster":

"id": 654151143791984640,
"date": "2021-06-13 14:43:00 GMT",
"id": 634582681880068096,
"date": "2015-12-25 03:29:00 GMT",
"id": 633952026211057664,
"date": "2015-08-24 22:23:00 GMT",
"id": 615605114369130496,
"date": "2020-04-17 01:26:26 GMT",
"id": 614668644253827072,
"date": "2020-04-06 17:21:37 GMT",
"id": 614468007221198848,
"date": "2020-04-04 12:12:35 GMT",
"id": 188445960720,
"date": "2019-10-19 11:13:39 GMT",
"id": 182410255070,
"date": "2019-01-30 00:29:26 GMT",
"id": 180146408044,
"date": "2018-11-15 19:43:41 GMT",
"id": 178156961219,
"date": "2018-09-16 21:43:37 GMT",

The only obvious difference in my approach compared to yours is I'm using OAuth2 rather than an ApiKey. I will test further.

marcustyphoon commented 5 months ago

The docs say the following

before: The timestamp of when you'd like to see posts before. If the Tag is a "featured" tag, use the "featured_timestamp" on the post object for pagination.

Yeah, moon6969 mentioned featured_timestamp, so they definitely read this line. But the line should probably be clarified more, e.g. If the Tag is a "featured" tag, use the `featured_timestamp` property on the post object for pagination instead of the `timestamp` property.

marcustyphoon commented 5 months ago

I am getting this for the first 10 responses for "PhotoToaster":

"id": 654151143791984640,
"date": "2021-06-13 14:43:00 GMT",

Yes, in addition to those fields there should be a timestamp field containing a unix timestamp integer.

moon6969 commented 5 months ago

Yes, in addition to those fields there should be a timestamp field containing a unix timestamp integer.

Isn't the date field just the timestamp converted to GMT?

marcustyphoon commented 5 months ago

Sure, but the integer form is what you can use to paginate. Are you seeing timestamp as well as date? Are you using a library that doesn't give you the full post object?

moon6969 commented 5 months ago

I'm getting the full post object and using the timestamp for 'before' parameter.

Are you getting the same 'PhotoToaster' post IDs as me (below)?

Further testing has revealed 3 confusing behaviours with the 'tagged' results...

1. The timestamps are not in order

So it's not certain that the timestamp of the last post returned in a batch is in fact higher than all remaining posts in the query. (Note 2nd & 3rd posts are not in order)

Processing Tag(s) 'PhotoToaster'
Requesting Tags
Received 20 posts
Processing post 654151143791984640 timestamp 1623595380
Processing post 634582681880068096 timestamp 1451014140
Processing post 633952026211057664 timestamp 1440454980
Processing post 615605114369130496 timestamp 1587086786
Processing post 614668644253827072 timestamp 1586193697
Processing post 614468007221198848 timestamp 1586002355
Processing post 188445960720 timestamp 1571483619
Processing post 182410255070 timestamp 1548808166
...

2. Using the 'limit' parameter returns a different set of results

Why does the same query as above with "limit=5" not return the same first 5 posts?

Received 5 posts
Processing post 654151143791984640 timestamp 1623595380
Processing post 634582681880068096 timestamp 1451014140
Processing post 633952026211057664 timestamp 1440454980
Processing post 633310376517353472 timestamp 1397757960
Processing post 632953911207133184 timestamp 1364580000
Requesting Tags before timestamp 1364580000 (2013-03-29_18-00-00)
Received 0 posts

This also shows the issue if the timestamps are not in order.

3. Why does tagged not consistently return 20 records?

With no limit specified, the number of posts returned from each 'before' call return trails off EG: 20, 20, 8, 2, 5, 1, 6,5,4 ...