Open maxheld83 opened 3 years ago
so ... it appears that these snapshots may sometimes changed after the fact, for example:
This could really mess up our reproducibility.
for example:
wget --server-response --spider --verbose \ https://api.crossref.org/snapshots/monthly/2018/04/all.json.tar.gz
> https://api.crossref.org/snapshots/monthly/2018/04/all.json.tar.gz Spider mode enabled. Check if remote file exists. --2021-04-12 21:55:40-- https://api.crossref.org/snapshots/monthly/2018/04/all.json.tar.gz Resolving api.crossref.org (api.crossref.org)... 208.254.38.72 Connecting to api.crossref.org (api.crossref.org)|208.254.38.72|:443... connected. HTTP request sent, awaiting response... HTTP/1.1 302 Found server: Apache-Coyote/1.1 crossref-deployment-name: svc1b-1 location: https://s3.amazonaws.com/org.crossref.snapshots/monthly/2018/04/all.json.tar.gz?Signature=M6eTWtV8BGlFQJUYsM%2BtiFS%2B57c%3D&AWSAccessKeyId=AKIAXKMFHONDMY2XFPDT&Expires=1618258241 content-length: 0 date: Mon, 12 Apr 2021 19:55:41 GMT connection: close Location: https://s3.amazonaws.com/org.crossref.snapshots/monthly/2018/04/all.json.tar.gz?Signature=M6eTWtV8BGlFQJUYsM%2BtiFS%2B57c%3D&AWSAccessKeyId=AKIAXKMFHONDMY2XFPDT&Expires=1618258241 [following] Spider mode enabled. Check if remote file exists. --2021-04-12 21:55:41-- https://s3.amazonaws.com/org.crossref.snapshots/monthly/2018/04/all.json.tar.gz?Signature=M6eTWtV8BGlFQJUYsM%2BtiFS%2B57c%3D&AWSAccessKeyId=AKIAXKMFHONDMY2XFPDT&Expires=1618258241 Resolving s3.amazonaws.com (s3.amazonaws.com)... 52.216.26.78 Connecting to s3.amazonaws.com (s3.amazonaws.com)|52.216.26.78|:443... connected. HTTP request sent, awaiting response... HTTP/1.1 200 OK x-amz-id-2: MFLya0RX2V5qwqRdOBIhwQtHJabaVOD9I+AXIZX5KYbkz6hyWJejqJgmz64GOJXY4VRJ1N1rw2E= x-amz-request-id: CMHYY55S0XR4ASZB Date: Mon, 12 Apr 2021 19:55:42 GMT Last-Modified: Thu, 17 May 2018 14:15:35 GMT ETag: "ed118e2ceb8d05d5bcf53f92fbef2511-2881" x-amz-tagging-count: 2 x-amz-version-id: null Accept-Ranges: bytes Content-Type: application/x-tar Content-Length: 48333216290 Server: AmazonS3 Length: 48333216290 (45G) [application/x-tar] Remote file exists.
also recently:
wget --server-response --spider --verbose \ https://api.crossref.org/snapshots/monthly/2020/09/all.json.tar.gz
I think as a (minimal) response we should lock down the (checksums?) in ETag.
ETag
so ... it appears that these snapshots may sometimes changed after the fact, for example:
This could really mess up our reproducibility.
for example:
also recently: