ticky / wayback-classic

🕸 A frontend for the Wayback Machine which works on old browsers
http://wayback-classic.net
GNU Affero General Public License v3.0
82 stars 6 forks source link

Missing date #29

Closed oifj34f34f closed 4 months ago

oifj34f34f commented 5 months ago

February 24, 2024 is missing from Wayback Classic, but available here

oifj34f34f commented 4 months ago

It seems to be a query issue:

curl 'https://web.archive.org/cdx/search/cdx?url=https://kisslinux.org/&output=json&from=202402&to=202402&collapse=digest'
[
    [
        "urlkey",
        "timestamp",
        "original",
        "mimetype",
        "statuscode",
        "digest",
        "length"
    ],
    [
        "org,kisslinux)/",
        "20240205160757",
        "https://kisslinux.org/",
        "warc/revisit",
        "-",
        "AQ23MLDHMGYAJZ2VEVXD2JIY2L6SHNUD",
        "848"
    ]
]
ticky commented 4 months ago

Hi @oifj34f34f thanks for digging into that! This stems from the use of the collapse=digest option, which asks the CDX API to only return unique results.

If you request without that option, you get three entries for that period:

[["urlkey","timestamp","original","mimetype","statuscode","digest","length"],
["org,kisslinux)/", "20240205160757", "https://kisslinux.org/", "warc/revisit", "-", "AQ23MLDHMGYAJZ2VEVXD2JIY2L6SHNUD", "848"],
["org,kisslinux)/", "20240205160757", "https://kisslinux.org/", "warc/revisit", "-", "AQ23MLDHMGYAJZ2VEVXD2JIY2L6SHNUD", "848"],
["org,kisslinux)/", "20240224161022", "https://kisslinux.org/", "text/html", "200", "AQ23MLDHMGYAJZ2VEVXD2JIY2L6SHNUD", "5200"]]

Note that they all have the same digest of AQ23MLDHMGYAJZ2VEVXD2JIY2L6SHNUD, meaning their underlying page contents are the same.

I agree that this is inconsistent with web.archive.org's UI, though this was an intentional choice to avoid presenting redundant snapshots on memory- and bandwidth-restricted clients.

Let me know what you think; adding an option to show these snapshots would be feasible, but I am not sure if it would be particularly useful.

oifj34f34f commented 4 months ago

@ticky Thanks for the reply, it makes sense!