Closed olearycrew closed 9 months ago
Interesting idea. I'm not familiar with GitLab.. do they provide equivalent / similar APIs for tracking public activity?
They (we) do: https://docs.gitlab.com/ee/api/
Ah, neat. So, one issue we have with the GH API is that we're hitting API limits and missing events. Ideally, instead of us polling, we'd be subscribing to a pubsub channel (e.g. GCP pubsub). Do you think this is something you guys would be willing to explore and support? I've been pushing GitHub folks to expose this as well.
Ping @brendano86, in case you missed the response from @igrigorik 🙂
Having public pubsub channels for both GitLab and GitHub would be very nice for data analysis purposes like this repo.
cc: @annafil anything we can do to support this?
@olearycrew The GitLab API currently does not allow bulk data collection.
1.3. When using, or attempting to use, the GitLab APIs, you agree: [...] 1.3.9. Not to use the GitLab APIs for the bulk collection or scraping of information.
And if you were to remove that clause, I have a feeling that there might be a bug report or two about the pagination API.
Since this issue was closed as "completed" instead of closed as "not planned", I was curious whether GitLab support had been implemented. I downloaded a random dump and checked whether it contained gitlab
URL, but it did not, so either the traffic is very low, or it has not been implemented.
wget https://data.gharchive.org/2024-01-01-15.json.gz
gunzip 2024-01-01-15.json.gz
grep -P 'url.{1,20}gitlab' 2024-01-01-15.json
Apologies, no marked incorrectly: this is a wontfix.
Understandable. Thank you for the clarification.
Ever thought about adding GitLab support to the project? Maybe forking the project/creating gitlabarchive.org?