supabase / repository.surf

🏄
https://repository.surf
MIT License
80 stars 11 forks source link

Discrepancy between total star count and latest point in star history chart #66

Open joshenlim opened 3 years ago

joshenlim commented 3 years ago

Bug report

This one's tricky - it might be that edge case issue that was mentioned. I noticed that the total star count of a repo does not match the latest node in the star history chart as such:

image

Total star count on the header is 4731 but the latest node in the chart is 4736, checking the github repo directly shows that supabase repo has 4731 stars

Another example under vercel:

image

Total star count on header is 59687 but latest node in the chart is 59721

One thing's consistent is that our chart is showing slightly higher numbers than the actual

joshenlim commented 3 years ago

@ykdojo feel free to clear the rows in our stars table if you need to for debugging

ykdojo commented 3 years ago

I'm trying to come up with a hypothesis for why this is happening.

Example: Repo X has 10 stars on Jan 3, 2021. We've already fetched them. On Jan 4, one of the stargazers un-stars the repo. Then the graph should show 9 stars, but when we call the GitHub API... Actually, I'm not sure what it would return exactly then. I'm going to look into a situation like this one - it's probably a good starting point.

joshenlim commented 3 years ago

lemme know if you need help debugging - we can use our repo.surf repository to test your hypothesis too? Unstar and star ourselves

ykdojo commented 3 years ago

Yup that's what I was thinking. I think I'll be able to look into this issue tomorrow, too.

ykdojo commented 3 years ago

I did some experiment for this. Here are my notes.

I tested some queries with GraphiQL here.

Query:

query {
  repository(owner: "supabase", name: "repository.surf") {
    stargazers(first: 100, after: null) {
      totalCount
      edges {
        starredAt
        cursor
      }
    }
  }
  rateLimit {
    limit
    cost
    remaining
    resetAt
  }
}

Result:

{
  "data": {
    "repository": {
      "stargazers": {
        "totalCount": 12,
        "edges": [
          {
            "starredAt": "2020-12-23T16:56:51Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg8Rv4A="
          },
          {
            "starredAt": "2020-12-23T17:31:35Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg8Ry6g="
          },
          {
            "starredAt": "2020-12-25T02:35:21Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg8UlDI="
          },
          {
            "starredAt": "2020-12-25T13:30:35Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg8VlIM="
          },
          {
            "starredAt": "2020-12-29T22:04:44Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg8ev70="
          },
          {
            "starredAt": "2020-12-30T03:20:07Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg8fP0A="
          },
          {
            "starredAt": "2020-12-30T09:38:56Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg8gAlw="
          },
          {
            "starredAt": "2020-12-30T14:37:31Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg8gkDw="
          },
          {
            "starredAt": "2020-12-30T14:39:00Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg8gkRs="
          },
          {
            "starredAt": "2020-12-30T16:24:50Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg8gwk8="
          },
          {
            "starredAt": "2021-01-02T03:51:33Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg8lbY0="
          },
          {
            "starredAt": "2021-01-04T04:35:04Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg8pk_o="
          }
        ]
      }
    },
    "rateLimit": {
      "limit": 5000,
      "cost": 1,
      "remaining": 4996,
      "resetAt": "2021-01-09T00:12:55Z"
    }
  }
}

I unstarred it and ran the same query, and here's the result:

{
  "data": {
    "repository": {
      "stargazers": {
        "totalCount": 11,
        "edges": [
          {
            "starredAt": "2020-12-23T16:56:51Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg8Rv4A="
          },
          {
            "starredAt": "2020-12-23T17:31:35Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg8Ry6g="
          },
          {
            "starredAt": "2020-12-25T13:30:35Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg8VlIM="
          },
          {
            "starredAt": "2020-12-29T22:04:44Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg8ev70="
          },
          {
            "starredAt": "2020-12-30T03:20:07Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg8fP0A="
          },
          {
            "starredAt": "2020-12-30T09:38:56Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg8gAlw="
          },
          {
            "starredAt": "2020-12-30T14:37:31Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg8gkDw="
          },
          {
            "starredAt": "2020-12-30T14:39:00Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg8gkRs="
          },
          {
            "starredAt": "2020-12-30T16:24:50Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg8gwk8="
          },
          {
            "starredAt": "2021-01-02T03:51:33Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg8lbY0="
          },
          {
            "starredAt": "2021-01-04T04:35:04Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg8pk_o="
          }
        ]
      }
    },
    "rateLimit": {
      "limit": 5000,
      "cost": 1,
      "remaining": 4995,
      "resetAt": "2021-01-09T00:12:55Z"
    }
  }
}

Comparing these two, it looks like my starring event was removed (as expected):

{
  "starredAt": "2020-12-25T02:35:21Z",
  "cursor": "Y3Vyc29yOnYyOpIAzg8UlDI="
},

Then, I tried running the same query with after: "Y3Vyc29yOnYyOpIAzg8UlDI=".

Result:

{
  "data": {
    "repository": {
      "stargazers": {
        "totalCount": 11,
        "edges": [
          {
            "starredAt": "2020-12-25T13:30:35Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg8VlIM="
          },
          {
            "starredAt": "2020-12-29T22:04:44Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg8ev70="
          },
          {
            "starredAt": "2020-12-30T03:20:07Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg8fP0A="
          },
          {
            "starredAt": "2020-12-30T09:38:56Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg8gAlw="
          },
          {
            "starredAt": "2020-12-30T14:37:31Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg8gkDw="
          },
          {
            "starredAt": "2020-12-30T14:39:00Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg8gkRs="
          },
          {
            "starredAt": "2020-12-30T16:24:50Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg8gwk8="
          },
          {
            "starredAt": "2021-01-02T03:51:33Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg8lbY0="
          },
          {
            "starredAt": "2021-01-04T04:35:04Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg8pk_o="
          }
        ]
      }
    },
    "rateLimit": {
      "limit": 5000,
      "cost": 1,
      "remaining": 4994,
      "resetAt": "2021-01-09T00:12:55Z"
    }
  }
}

As you can see, it only has 9 stargazers - everything after my (deleted) starred event.

Then, I starred it again and ran the same query (with after: null again):

{
  "data": {
    "repository": {
      "stargazers": {
        "totalCount": 12,
        "edges": [
          {
            "starredAt": "2020-12-23T16:56:51Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg8Rv4A="
          },
          {
            "starredAt": "2020-12-23T17:31:35Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg8Ry6g="
          },
          {
            "starredAt": "2020-12-25T13:30:35Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg8VlIM="
          },
          {
            "starredAt": "2020-12-29T22:04:44Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg8ev70="
          },
          {
            "starredAt": "2020-12-30T03:20:07Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg8fP0A="
          },
          {
            "starredAt": "2020-12-30T09:38:56Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg8gAlw="
          },
          {
            "starredAt": "2020-12-30T14:37:31Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg8gkDw="
          },
          {
            "starredAt": "2020-12-30T14:39:00Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg8gkRs="
          },
          {
            "starredAt": "2020-12-30T16:24:50Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg8gwk8="
          },
          {
            "starredAt": "2021-01-02T03:51:33Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg8lbY0="
          },
          {
            "starredAt": "2021-01-04T04:35:04Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg8pk_o="
          },
          {
            "starredAt": "2021-01-08T23:23:01Z",
            "cursor": "Y3Vyc29yOnYyOpIAzg82OZA="
          }
        ]
      }
    },
    "rateLimit": {
      "limit": 5000,
      "cost": 1,
      "remaining": 4993,
      "resetAt": "2021-01-09T00:12:55Z"
    }
  }
}

This time, you can see that my starred event moved to the end of this list - kind of as expected.

So, it seems like the only way to make sure that our data is correct is by going through the entire list of stargazers from time to time. We'll probably need some kind of queuing system for this.

With a queuing system, it will look like this:

NOTE: I think we should reserve, say, at least 2000 API calls for on-demand calls.

So we'll reserve ~1000 calls for Supabase, and another ~1000 calls for on-demand calls for other repos. The rest (about 3000) will be available for anything - on-demand calls or queued repos.

ykdojo commented 3 years ago

I thought of a simpler, easier solution than making a queue system:

We can just allow people to "claim" an org (#33), and then click a button to "refresh" certain repos. I guess this button should be shown when a single repo is selected. We could also show it when multiple repos are selected, in case we want to allow people to refresh multiple repos at once.