ppy / osu-api

Public API for accessing osu! related data.
320 stars 16 forks source link

Unable to retrieve all ranked/loved maps using "standard" download flow #286

Open Piotrekol opened 4 years ago

Piotrekol commented 4 years ago

Cause being that loved maps, in one "loved batch", have same exact approved_date.

reproduce steps: call: https://osu.ppy.sh/api/get_beatmaps?since=2018-06-11%2019:40:06&k=key several loved maps are missing in that call (notice how all maps at the end end with the exact same approved_date) eg. mapId:958038

standard flow is to take the last approved_date and re-fire request with updated since param until less than 500 maps appear. I've already created an iffy workaround for osustats, but I thought I would report this anyway.

tybug commented 4 years ago

Retrieving that beatmap's information directly with https://osu.ppy.sh/api/get_beatmaps?k=KEY&b=958038 gives

...
"approved_date": "2018-06-25 02:05:26",
...

which is off from what the website says (loved on 24 June 2018). This is possibly an osu-web issue rather than an osu-api issue.

tybug commented 4 years ago

Whoops, I see my mistake. Website is in UTC-4 (for me) and api responses are in UTC. Disregard my response.

peppy commented 4 years ago

Are you saying that over 500 maps were "loved" in one go?

Piotrekol commented 4 years ago

No, not 500, but by getting maps from the first ranked map ever you end up with URL that will only contain fraction of loved maps. When calling(I guess that's missing in my steps) next get_beatmaps request after one I mentioned, you will end up skipping some of these due to them having same dates.

peppy commented 4 years ago

If there's not 500 beatmaps with the same date, isn't calling with 500 pagination going to work okay?

Piotrekol commented 4 years ago

Not necessarily. Let's use link from my first response as an example:

  1. You fetch that link contents->500maps
  2. Extract last(or max) approved_date. In this case 2018-06-25 02:05:26
  3. Create new request with date above and get results->500maps since that date, even though there are more maps with that exact date that weren't included in previous request(because of 500 limit)

Thus, yes Api is working as intended, but I would highly suggest to retroactively update loved maps approved_date to not be identical in same loved batches.

And now that I think of it, same could happen for ranked maps since iirc these sets also share same approved_date.

Magnus-Cosmos commented 4 years ago

I ran into this problem as well when trying to get ranked+loved maps, so I just subtracted 1 second from last approved_date, then got rid of duplicates, not sure if there's any better way to do this.

Piotrekol commented 4 years ago

I just subtracted 1 second from last approved_date That's exactly the same workaround I added to osustats before reporting this.. Either this should get somehow fixed one way or another, or wiki should have a mention of that "gotcha"

peppy commented 4 years ago

This will be fixed with proper pagination in api v2. Not sure if it will be addressed on v1.

Have you tried using v2 for your purpose?

Piotrekol commented 4 years ago

No, I have not. image I remember mentions about live-feed of scores being added later on, is that planned at some point or I'm just misinformed? (like https://osu.ppy.sh/p/events in api form)

jxu commented 4 years ago

I also did the subtracting 1 second thing about a year ago. Better fix is instead of using since parameter which may be identical is to paginate by beatmap id which is guaranteed unique.

If you don't need the maps since the beginning of the month you can use data.ppy.sh to access the whole table at once. #193