igrigorik / gharchive.org

GH Archive is a project to record the public GitHub timeline, archive it, and make it easily accessible for further analysis.
https://www.gharchive.org
MIT License
2.7k stars 207 forks source link

GitLab support #203

Closed olearycrew closed 9 months ago

olearycrew commented 6 years ago

Ever thought about adding GitLab support to the project? Maybe forking the project/creating gitlabarchive.org?

igrigorik commented 6 years ago

Interesting idea. I'm not familiar with GitLab.. do they provide equivalent / similar APIs for tracking public activity?

olearycrew commented 6 years ago

They (we) do: https://docs.gitlab.com/ee/api/

igrigorik commented 5 years ago

Ah, neat. So, one issue we have with the GH API is that we're hitting API limits and missing events. Ideally, instead of us polling, we'd be subscribing to a pubsub channel (e.g. GCP pubsub). Do you think this is something you guys would be willing to explore and support? I've been pushing GitHub folks to expose this as well.

voxpelli commented 5 years ago

Ping @brendano86, in case you missed the response from @igrigorik 🙂

Having public pubsub channels for both GitLab and GitHub would be very nice for data analysis purposes like this repo.

hamelsmu commented 5 years ago

cc: @annafil anything we can do to support this?

983 commented 2 years ago

@olearycrew The GitLab API currently does not allow bulk data collection.

1.3. When using, or attempting to use, the GitLab APIs, you agree: [...] 1.3.9. Not to use the GitLab APIs for the bulk collection or scraping of information.

And if you were to remove that clause, I have a feeling that there might be a bug report or two about the pagination API.

983 commented 9 months ago

Since this issue was closed as "completed" instead of closed as "not planned", I was curious whether GitLab support had been implemented. I downloaded a random dump and checked whether it contained gitlab URL, but it did not, so either the traffic is very low, or it has not been implemented.

wget https://data.gharchive.org/2024-01-01-15.json.gz
gunzip 2024-01-01-15.json.gz
grep -P 'url.{1,20}gitlab' 2024-01-01-15.json
igrigorik commented 9 months ago

Apologies, no marked incorrectly: this is a wontfix.

983 commented 9 months ago

Understandable. Thank you for the clarification.