status-im / status-github-bot

A bot for github
https://status-github-bot.herokuapp.com/
ISC License
11 stars 12 forks source link

Improve script which initializes bot with GitHub->Slack name mappings for active users #12

Closed pedropombeiro closed 6 years ago

pedropombeiro commented 6 years ago

Description

The lib/retrieve-slack-github-users.js script retrieves all the Slack users and fetches the respective profile to retrieve the GitHub ID. Currently the initial list fetch operation is not paginated, which poses a risk once Slack will clamp down on non-paginated API calls. The goal is to adapt the script so that it paginates the call using cursors.

Implementation

The Slack API has a https://api.slack.com/methods/users.list method which will return all the members of the status-im Slack organization. That will return a complete list of users (which currently stands at around 2MB, so we should use pagination).

kpasokhi commented 6 years ago

Hi, testing this issue needs Slack standard subscription. Can you provide this?

pedropombeiro commented 6 years ago

Hi @kpasokhi. I'm afraid that would require me to share a full access token - Slack doesn't seem to provide a way to generate a test token with granular permissions. Do you think you could work on it if I provide you with sample results?

This is what I get if I ask for https://slack.com/api/users.list?limit=2&pretty=1:

{
    "ok": true,
    "offset": "U0PEES2EM",
    "members": [
        {
            "id": "USLACKBOT",
            "team_id": "T0PDCGZJQ",
            ...
        },
        {
            "id": "U0PDFC7EX",
            "team_id": "T0PDCGZJQ",
            ...
        }
    ],
    "cache_ts": 1519033293,
    "response_metadata": {
        "next_cursor": "dXNlcjpVMFBFRVMyRU0="
    }
}

Then if I take the value of response_metadata.next_cursor and pass it in the next request https://slack.com/api/users.list?cursor=dXNlcjpVMFBFRVMyRU0%3D&limit=2&pretty=1, I get:

{
    "ok": true,
    "offset": "U0PEGM8TA",
    "members": [
        {
            "id": "U0PEES2EM",
            "team_id": "T0PDCGZJQ",
            ...
        },
        {
            "id": "U0PEFLX8E",
            "team_id": "T0PDCGZJQ",
            ...
       }
    ],
    "cache_ts": 1519033721,
    "response_metadata": {
        "next_cursor": "dXNlcjpVMFBFR004VEE="
    }
}

Once you feel you have a working solution, I could validate it on the real data and help fix any possible issue that might prevent it from working correctly. What do you think?

kpasokhi commented 6 years ago

@PombeirP OK, I'll give it a try!

realphi commented 6 years ago

I have implemented a solution with recursive async function looping. Since there is no clue for number of pages it's not possible to go other way. Although I think the slack rate limiting may slow the startup of the bot when user base becomes bigger.

pedropombeiro commented 6 years ago

That's awesome @realphi! The startup speed is not really an issue, since we're caching the information and only periodically refreshing it.