tweaselORG / parse-tunes

Library for fetching select data on iOS apps from the Apple App Store via undocumented internal iTunes APIs.
MIT License
10 stars 1 forks source link

App Store top chart data #2

Closed baltpeter closed 1 year ago

baltpeter commented 1 year ago

We need a reliable automated way to access the most popular apps on iOS. We want the list to be as long as possible, and we need to be able to fetch the top charts overall but also for individual categories.

There are many ways of getting to this data from various sources.

baltpeter commented 1 year ago

Apple offers an RSS feed generator for the top charts of various media types they sell (including apps). Using that, it is possible to obtain an XML or JSON file (despite the tool's name) of the top free or top paid apps per country on the App Store. The generator only returns up to 50 apps, but it is possible to retrieve up to 100 apps by manually adjusting the result limit parameter in the URL: https://rss.applemarketingtools.com/api/v2/de/apps/top-free/<limit>/apps.json
Requesting more than 100 apps will result in an internal server error.


An older version of the RSS feed generator used URLs like this, offering a few more options:

This version returned 200 apps. Unfortunately, it doesn't work anymore. The links just redirect to the new generator.


An even older version (https://itunes.apple.com/de/rss/topfreeapplications/limit=200/genre=36/json) does still work

This one allows fetching the top charts per category (see #3).

It used to return up to 200 apps, but doesn't anymore. Setting the limit parameter higher than 200, results in a 400 error (Invalid value for param 'limit'.). Between 100 and 200, it always returns 100 apps.

baltpeter commented 1 year ago

It used to be possible to get more top apps through an endpoint that was used in old versions of the iOS App Store: http://itunes.apple.com/WebObjects/MZStore.woa/wa/topChartFragmentData?cc=de&genreId=36&pageSize=1500&popId=27&pageNumbers=0 (#).

Through that, one used to be able to get the top 1,500 apps per category. However, that endpoint now only returns the top 100 apps per category.

baltpeter commented 1 year ago

Today, I discovered that there is actually a page on the official App Store website that lists the top charts: https://apps.apple.com/us/charts/iphone.

It can also distinguish by category (e.g. https://apps.apple.com/us/charts/iphone/top-free-apps/36), but only lists 100 apps per category.

baltpeter commented 1 year ago

There are also various third-parties that collect and offer (sell) this data, some even historically:

AppFigures is most generous of those and shows the top 200 apps for free and without signing in.

baltpeter commented 1 year ago

iTunes on Windows (newer versions of iTunes don't include support for the iOS App Store anymore, but Apple offers a special, unsupported (but continuing to work as of the time of writing) version of iTunes (12.6.5.3) that still contains this feature and doesn't prompt the user to update to newer versions: https://support.apple.com/HT208079) can display top charts for each category with up to 200 results each.

This is the endpoint they are using:

https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewTop?cc=<country code>&genreId=<genre ID>&l=<language code>&popId=<top list type>

Parameters

The cc and l GET parameters control the country and language, respectively. The popId parameter determines the type of top chart returned (more on that below).
The genreId parameter controls the category the returned top list is for (see #3). 36 is the first-level category for all apps on the App Store. The second level then has the actual app categories, e.g. 6000 for "Business". There are also third-level categories but only for "Games" and "Newsstand".

In addition to the GET parameters, the X-Apple-Store-Front header also needs to be set (see #1).

popId

The JSON response from this endpoint (see #1 for how to get it to return JSON) also contains a title for the chart ($.pageData.segmentedControl.segments[0].pageData.selectedChart.title).

I wrote a quick script to extract the titles for all popIds between 0 and 200:

import fetch from 'cross-fetch';
import { writeFile } from 'fs/promises';

(async () => {
    const pops = {};

    for (let i = 0; i <= 200; i++) {
        try {
            const res = await fetch(
                `https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewTop?genreId=36&popId=${i}`,
                {
                    method: 'GET',
                    headers: { 'X-Apple-Store-Front': `143443-2,26` },
                }
            ).then((response) => response.json());
            const data = res.pageData.segmentedControl.segments[0].pageData;

            if (i !== +data.selectedChart.id) continue;

            pops[+data.selectedChart.id] = {
                pageType: data.metricsBase.pageType,
                pageDetails: data.metricsBase.pageDetails,
                page: data.metricsBase.page,
                kinds: data.selectedChart.kinds,
                shortTitle: data.selectedChart.shortTitle,
                title: data.selectedChart.title,
                pageTitle: data.pageTitle,
            };
            console.log(i, data.selectedChart.title);
        } catch {}
    }

    await writeFile('pops.json', JSON.stringify(pops, null, 4));

})();

This gives the following result:

27 Top Free iPhone Apps
30 Top Paid iPhone Apps
38 Top Grossing iPhone Apps
(The other fields don't have any interesting data.) ```json { "27": { "pageType": "TopChartsPage", "pageDetails": "Top Free iPhone Apps_Mobile Software Applications", "page": "TopChartsPage_36", "kinds": { "iosSoftware": true }, "shortTitle": "Free", "title": "Top Free iPhone Apps", "pageTitle": "Top Charts" }, "30": { "pageType": "TopChartsPage", "pageDetails": "Top Paid iPhone Apps_Mobile Software Applications", "page": "TopChartsPage_36", "kinds": { "mobileSoftwareBundle": true, "iosSoftware": true }, "shortTitle": "Paid", "title": "Top Paid iPhone Apps", "pageTitle": "Top Charts" }, "38": { "pageType": "TopChartsPage", "pageDetails": "Top Grossing iPhone Apps_Mobile Software Applications", "page": "TopChartsPage_36", "kinds": { "iosSoftware": true }, "shortTitle": "Top Grossing", "title": "Top Grossing iPhone Apps", "pageTitle": "Top Charts" } } ```

EDIT: See below for a correction.

All values for popId between 0 and 156 other than the three above map to Top Paid iPhone Apps (and also return the same results, I've checked). Values greater than 156 result in an internal server error response.

From my older notes, there used to also be popIds for iPad apps (44 for top free iPad, 46 for top grossing iPad, 47 for top paid iPad). That does not appear to be case anymore. I'm assuming that it is still possible to access this data through this endpoint but I haven't looked into that further.

Output

The JSON response from the endpoint contains multiple different lists of apps and it is important to know which one has the correct data:

baltpeter commented 1 year ago

From my older notes, there used to also be popIds for iPad apps (44 for top free iPad, 46 for top grossing iPad, 47 for top paid iPad). That does not appear to be case anymore. I'm assuming that it is still possible to access this data through this endpoint but I haven't looked into that further.

Since I had to fire up my mitmproxy-ed iTunes 12.6.5.3 anyway, I couldn't help but have a quick look at that after all. :D

Turns out: iTunes 12.6.5.3 can still fetch the top charts for iPad and it does use the popIds I mentioned!

The returned list changes depending on the platform in the X-Apple-Store-Front header (see #1). I was using 26 (which only returned iPhone top charts), iTunes is using 32 (which also returns iPad top charts). But unfortunately, the iTunes platform returns the data nested in a script in an HTML page, which is much more annoying, of course. So, I tested all possible platforms to check which ones return the iPad data:

import fetch from 'cross-fetch';

(async () => {
    for (let i = 0; i <= 200; i++) {
        try {
            const res = await fetch('https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewTop?genreId=36&popId=44', {
                method: 'GET',
                headers: { 'X-Apple-Store-Front': `143443-2,${i}` },
            }).then((response) => response.json());

            const title = res.pageData.segmentedControl.segments[0].pageData.selectedChart.title;
            console.log(i, title);
        } catch {}
    }
})();

Here's the result:

25 Top Free iPad Apps
26 Top Paid iPhone Apps
29 Top Paid iPhone Apps
30 Top Free iPad Apps
31 Top Paid iPhone Apps
44 Top Paid iPhone Apps

So, only platform 25 and 30 return top charts for iPad. Unfortunately, from some more testing, they only return iPad data but never return iPhone data. We'll need to change the platform depending on whether we want to fetch iPhone or iPad top chart data. sigh

baltpeter commented 1 year ago

Another observation worth noting: There are different versions of the same top chart for the same day, depending on the Apple endpoint or third-party you're asking.

They differ slightly in the following ways:

I haven't looked into this further and I don't think it's too critical but it is good to be aware of that, I suppose.

baltpeter commented 1 year ago

There is yet another endpoint! I've discovered this one in the response of the genre endpoint (#3):

https://itunes.apple.com/WebObjects/MZStoreServices.woa/ws/charts?cc=us&g=36&name=FreeAppsV2

By default, it returns only 5 app IDs. By trial and error, I found that you can add a limit param and get up to 100 app IDs:

https://itunes.apple.com/WebObjects/MZStoreServices.woa/ws/charts?cc=us&g=36&name=FreeAppsV2&limit=200

Here's the full list of URLs included in the genre response:

{
  "appsByRevenue": "https://itunes.apple.com/WebObjects/MZStoreServices.woa/ws/charts?cc=us&g=36&name=AppsByRevenue",
  "freeApplications": "https://itunes.apple.com/WebObjects/MZStoreServices.woa/ws/charts?cc=us&g=36&name=FreeApplications",
  "freeAppleTVApps": "https://itunes.apple.com/WebObjects/MZStoreServices.woa/ws/charts?cc=us&g=36&name=FreeAppleTVApps",
  "paidAppleTVApps": "https://itunes.apple.com/WebObjects/MZStoreServices.woa/ws/charts?cc=us&g=36&name=PaidAppleTVApps",
  "freeAppsV2": "https://itunes.apple.com/WebObjects/MZStoreServices.woa/ws/charts?cc=us&g=36&name=FreeAppsV2",
  "paidIpadApplications": "https://itunes.apple.com/WebObjects/MZStoreServices.woa/ws/charts?cc=us&g=36&name=PaidIpadApplications",
  "ipadAppsByRevenue": "https://itunes.apple.com/WebObjects/MZStoreServices.woa/ws/charts?cc=us&g=36&name=IpadAppsByRevenue",
  "freeIpadApplications": "https://itunes.apple.com/WebObjects/MZStoreServices.woa/ws/charts?cc=us&g=36&name=FreeIpadApplications",
  "paidApplications": "https://itunes.apple.com/WebObjects/MZStoreServices.woa/ws/charts?cc=us&g=36&name=PaidApplications",
  "appleTVAppsByRevenue": "https://itunes.apple.com/WebObjects/MZStoreServices.woa/ws/charts?cc=us&g=36&name=AppleTVAppsByRevenue",
  "applications": "https://itunes.apple.com/WebObjects/MZStoreServices.woa/ws/charts?cc=us&g=36&name=Applications",
  "freeMacAppsV2": "https://itunes.apple.com/WebObjects/MZStoreServices.woa/ws/charts?cc=us&g=36&name=FreeMacAppsV2"
}

This endpoint can also filter by genre. I haven't looked into what the different charts are. Maybe those are the different result sets I observed in https://github.com/tracking-weasel/parse-tunes/issues/2#issuecomment-1377306393?