Closed oifj34f34f closed 7 months ago
It seems to be a query issue:
curl 'https://web.archive.org/cdx/search/cdx?url=https://kisslinux.org/&output=json&from=202402&to=202402&collapse=digest'
[
[
"urlkey",
"timestamp",
"original",
"mimetype",
"statuscode",
"digest",
"length"
],
[
"org,kisslinux)/",
"20240205160757",
"https://kisslinux.org/",
"warc/revisit",
"-",
"AQ23MLDHMGYAJZ2VEVXD2JIY2L6SHNUD",
"848"
]
]
Hi @oifj34f34f thanks for digging into that! This stems from the use of the collapse=digest
option, which asks the CDX API to only return unique results.
If you request without that option, you get three entries for that period:
[["urlkey","timestamp","original","mimetype","statuscode","digest","length"],
["org,kisslinux)/", "20240205160757", "https://kisslinux.org/", "warc/revisit", "-", "AQ23MLDHMGYAJZ2VEVXD2JIY2L6SHNUD", "848"],
["org,kisslinux)/", "20240205160757", "https://kisslinux.org/", "warc/revisit", "-", "AQ23MLDHMGYAJZ2VEVXD2JIY2L6SHNUD", "848"],
["org,kisslinux)/", "20240224161022", "https://kisslinux.org/", "text/html", "200", "AQ23MLDHMGYAJZ2VEVXD2JIY2L6SHNUD", "5200"]]
Note that they all have the same digest of AQ23MLDHMGYAJZ2VEVXD2JIY2L6SHNUD
, meaning their underlying page contents are the same.
I agree that this is inconsistent with web.archive.org's UI, though this was an intentional choice to avoid presenting redundant snapshots on memory- and bandwidth-restricted clients.
Let me know what you think; adding an option to show these snapshots would be feasible, but I am not sure if it would be particularly useful.
@ticky Thanks for the reply, it makes sense!
February 24, 2024 is missing from Wayback Classic, but available here