igrigorik / gharchive.org

GH Archive is a project to record the public GitHub timeline, archive it, and make it easily accessible for further analysis.
https://www.gharchive.org
MIT License
2.69k stars 207 forks source link

hourly archives returning ERROR 403:Forbidden since 2012-11-04 #15

Closed bschineller closed 12 years ago

bschineller commented 12 years ago

Temporary issue or has something changed? Coincidental with the date that clocks rolled back? Up until 2012-11-03-23 responds normally.

wget http://data.githubarchive.org/2012-11-04-01.json.gz --2012-11-05 14:13:55-- http://data.githubarchive.org/2012-11-04-01.json.gz Resolving data.githubarchive.org... 205.251.242.164 Connecting to data.githubarchive.org|205.251.242.164|:80... connected. HTTP request sent, awaiting response... 403 Forbidden 2012-11-05 14:13:55 ERROR 403: Forbidden.

igrigorik commented 12 years ago

Hmm, just double checked.. all the data is good. I think the gotcha is in the URL: the hour is not zero padded.

Aka: http://data.githubarchive.org/2012-11-04-01.json.gz - that should do it!

bordercore commented 12 years ago

I'm also getting a 403 error for any time after 2012-11-04-1, eg:

$ wget http://data.githubarchive.org/2012-11-05-10.json.gz --2012-11-06 10:31:37-- http://data.githubarchive.org/2012-11-05-10.json.gz Resolving data.githubarchive.org... 72.21.203.148 Connecting to data.githubarchive.org|72.21.203.148|:80... connected. HTTP request sent, awaiting response... 403 Forbidden 2012-11-06 10:31:37 ERROR 403: Forbidden.

bschineller commented 12 years ago

Thanks for looking into it. I misreported my example originally. Currently, without zero padding, it works at 2012-11-04-1, but fails any time after that:

wget http://data.githubarchive.org/2012-11-04-1.json.gz --2012-11-06 13:46:29-- http://data.githubarchive.org/2012-11-04-1.json.gz Resolving data.githubarchive.org... 72.21.215.75 Connecting to data.githubarchive.org|72.21.215.75|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 830280 (811K) [application/json] Saving to: `2012-11-04-1.json.gz.3'

wget http://data.githubarchive.org/2012-11-04-2.json.gz --2012-11-06 13:46:35-- http://data.githubarchive.org/2012-11-04-2.json.gz Resolving data.githubarchive.org... 72.21.215.75 Connecting to data.githubarchive.org|72.21.215.75|:80... connected. HTTP request sent, awaiting response... 403 Forbidden 2012-11-06 13:46:35 ERROR 403: Forbidden.

wget http://data.githubarchive.org/2012-11-04-3.json.gz --2012-11-06 13:46:46-- http://data.githubarchive.org/2012-11-04-3.json.gz Resolving data.githubarchive.org... 72.21.215.75 Connecting to data.githubarchive.org|72.21.215.75|:80... connected. HTTP request sent, awaiting response... 403 Forbidden 2012-11-06 13:46:46 ERROR 403: Forbidden.

igrigorik commented 12 years ago

Hmm, indeed - thanks for the heads up. Just did a manual sync, should be up now.

bschineller commented 12 years ago

thanks. looks like your manual sync caught things up to 11-06-14

wget http://data.githubarchive.org/2012-11-06-14.json.gz --2012-11-07 17:02:06-- http://data.githubarchive.org/2012-11-06-14.json.gz Resolving data.githubarchive.org... 205.251.242.100 Connecting to data.githubarchive.org|205.251.242.100|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 2405794 (2.3M) [application/json] Saving to: `2012-11-06-14.json.gz'

.... but again after that point data not available. automation you used to have in place broken?

wget http://data.githubarchive.org/2012-11-06-15.json.gz --2012-11-07 17:02:01-- http://data.githubarchive.org/2012-11-06-15.json.gz Resolving data.githubarchive.org... 205.251.242.100 Connecting to data.githubarchive.org|205.251.242.100|:80... connected. HTTP request sent, awaiting response... 403 Forbidden 2012-11-07 17:02:02 ERROR 403: Forbidden.