webrecorder / public-web-archives

A listing of world wide web archives, for humans and machines using Web Archive Manifest (WAM) yaml format
Creative Commons Zero v1.0 Universal
43 stars 12 forks source link

Public Web Archives

The purpose of this repository is an experiment in creating a distributed listing of web archives.

To accomplish, a new format, the Web Archive Manifest is introduced to describe web archives and what properties and APIs they support. The format is designed to be readable by humans and processed by new and existing software tools.

The goal is to highlight, and help promote the sizable (and growing list!) of publicly accessible web archives all over the world, in a distributed and democratic way.

A lot of people may be familiar with "Wayback Machine", but there are actually many wayback machines all over the world. Let's make them more widely known and accessible!

How are the web archives listed? How can new archives be included?

There is a YAML file following the WAM spec for each web archive in the webarchives directory.

YAML was chosen as it strikes a good balance between readability and is easily processable by a wide variety of tools in a variety of languages.

The intent is for the format to be a 'living standard' that may adapt as needed as web archives evolve.

There is also an index which specifies to include all files in the directory.

To add a new web archive, simply add a new .yaml file in this directory.

What archives are included in the list?

This listing is specifically for web archives which preserve and provide web content and make it publicly accessible.

While there are many great archives out there, this format and directory is specifically limited to web archives.

What other properties must archives have to be included?

Any web archive can be included in the listing, even if they do not support any of the established apis.

For a list of currently supported apis, see the WAM Spec

This directory should also not be seen as an exhaustive list of all web archive apis, as many may support, custom or specific apis.

If there is another api spec that should be included in this shared listing, feel free to submit it as a request and/or suggest how it might be included!

Why make a new web archive directory?

The intent of this directory is to be:

This directory and WAM format are intended to encourage interoperability and interconnectedness between different web archives.

Aren't there other archives lists out there already?

Yes! It is important to recognize that there are a few existing lists out there, mostly originating from the Memento project.

If there are other such lists, feel free to let us know or submit a pull request to include them here.

Who can contribute? What if I'd like to add/remove a web archive?

Anyone can contribute! We definitely encourage contributions to this repo to make it a truly distributed project:

Any plans for extending this format? How could it be made more distributed?

Yes! Currently, all the web archives are specified explicitly in this repository.

However, it would be really great if web archives start to 'advertise' what APIs they support and other information included in the WAM file.

For example, an archive could provide: http://myarchive.example.com/wam.yaml and then the file need not be stored in this repository, and we would only need to add this url to the index

If adding support for WAM to a web archive, please let us know or submit a PR to include this information.

What tools use this listing?

None yet!

But we hope that this will change, and would be happy to add any tools that make use of this format or listing, directly or directly.

A future release of pywb will likely add support for reading WAM format files.

Webrecorder may also use this directory to provide users the ability to work with existing web archives.

Who created this listing?

This web archive listing and the WAM format originates with the Webrecorder project, which aims to promote distributed web archiving, encouraging anyone to create and run their own web archives. Having a formal Web Archive Manifest, as well as a public, distributed web archive directory aligns perfectly with this mission.

License

CC0

This document, the WAM format and the accompanying web archive directory are released into the public domain under CC0.