otosky / medium_stats

Command Line and Python tool for Scraping Your Medium Stats
GNU General Public License v3.0
20 stars 5 forks source link

Fetch more than 50 stories (get_all_story_overview) #10

Closed peterfriese closed 2 years ago

peterfriese commented 2 years ago

This is a fantastic library, thanks for building it!

I've got a publication with more than 50 articles, and would like to fetch stats for all of them. However, it seems like it's not possible to fetch more than 50 at the moment due to Medium's pagination - see you comment here:

https://github.com/otosky/medium_stats/blob/ef96a205bf3b695e48550de8cb4fec74907cb073/medium_stats/scraper.py#L297

I am happy to test this using the publication I manage, if this helps.

otosky commented 2 years ago

Thanks @peterfriese! I admittedly haven't touched this project for a while and it's due for a big refactor that I started, but didn't finish. Happy that you're getting some use out of it in the meantime!

If you're able to set a breakpoint in a debugger and send me a sample response from the endpoint that's getting paginated here I can probably figure out what's necessary to iterate through all pages:

https://github.com/otosky/medium_stats/blob/ef96a205bf3b695e48550de8cb4fec74907cb073/medium_stats/scraper.py#L295-L299

I have a dummy publication attached to my profile that I can probably also test with, but if you already have a sample of what the response looks like, that might be faster.

peterfriese commented 2 years ago

Hi @otosky - here's a response from Firebase Developers on Medium:

{
  "success": true,
  "payload": {
    "value": [
      {
        "postId": "757e8207df54",
        "slug": "calling-asynchronous-firebase-apis-from-swift",
        "previewImage": {
          "id": "0*Um3VykF0-vfJulIu.png",
          "originalWidth": 1440,
          "originalHeight": 960,
          "isFeatured": true
        },
        "title": "Calling asynchronous Firebase APIs from Swift",
        "creatorId": "ea0b1eb1f5d2",
        "collectionId": "8e8b7dc6774d",
        "upvotes": 9,
        "views": 237,
        "reads": 63,
        "createdAt": 1643810141133,
        "firstPublishedAt": 1643811805581,
        "visibility": 0,
        "firstPublishedAtBucket": "February 2022",
        "readingTime": 7,
        "syndicatedViews": 6,
        "claps": 48,
        "updateNotificationSubscribers": 0,
        "isSeries": false,
        "internalReferrerViews": 146,
        "friendsLinkViews": 0,
        "primaryTopic": {
          "topicId": "ab3d8f7f8eb1",
          "slug": "ios-development",
          "createdAt": 1521651850182,
          "deletedAt": 0,
          "image": {
            "id": "1*g_B4JNulmfXSj0AyEjImyA@2x.jpeg",
            "originalWidth": 5184,
            "originalHeight": 3456
          },
          "name": "iOS Dev",
          "description": "Appy talk.",
          "relatedTopics": [],
          "visibility": 1,
          "relatedTags": [],
          "relatedTopicIds": [],
          "seoTitle": "iOS App Development: Articles and News — Medium",
          "type": "Topic"
        },
        "type": "PostStat"
      },
      {
        "postId": "e037a6654a93",
        "slug": "why-are-the-firebase-apis-asynchronous",
        "previewImage": {
          "id": "1*5-HHVCtNGRXTHmf4EAqMFg.png",
          "originalWidth": 2040,
          "originalHeight": 1124,
          "isFeatured": true
        },
        "title": "Why are the Firebase APIs asynchronous?",
        "creatorId": "6a53613f4e6d",
        "collectionId": "8e8b7dc6774d",
        "upvotes": 229,
        "views": 63776,
        "reads": 22963,
        "createdAt": 1518641035363,
        "firstPublishedAt": 1518664828157,
        "visibility": 0,
        "firstPublishedAtBucket": "February 2018",
        "readingTime": 7,
        "syndicatedViews": 2065,
        "claps": 1200,
        "updateNotificationSubscribers": 0,
        "isSeries": false,
        "internalReferrerViews": 1103,
        "friendsLinkViews": 0,
        "type": "PostStat"
      }
    ],
    "collection": {
      "id": "8e8b7dc6774d",
      "name": "Firebase Developers",
      "slug": "firebase-developers",
      "type": "Collection"
    },
    "paging": {
      "previous": {
        "limit": 1000,
        "from": "1644221018910"
      },
      "next": {
        "limit": 1000,
        "to": "1518664828157",
        "bucketType": "MONTH"
      }
    },
    "references": {}
  },
  "v": 3,
  "b": "20220211-1557-root"
}

I also found that changing the limit from 50 to 1000 did the trick for us (we've got about 170 articles at the moment). So either giving us a way to configure the limit or fetching paginated data would work for us.

otosky commented 2 years ago

Hey @peterfriese, sorry it took me so long to get to this.

Added the ability to both configure the limit on get_all_story_overview and handle pagination in #11. The new release is available on PyPI as the latest version -- 2.2.0

This is something of a hotfix since I want to refactor the whole lib, but hope this covers your use-case!

peterfriese commented 2 years ago

Awesome, just gave it a try, and it works smoothly for Firebase Developers on Medium. Thanks!