Please note: this project is no longer being actively developed.
Please use the new Webrecorder Player app, available for download here. The Webrecorder Player will receive regular feature updates in sync with https://webrecorder.io/
WebArchivePlayer is a new desktop tool which provides a simple point-and-click wrapper for viewing any web archive file (in WARC and ARC format).
To create a web archive (WARC) file of your own, you can use the free https://webrecorder.io/ service to browse any page and then download the recorded WARC file.
The player allows users to pick one or more ARC/WARC from their local machine and browse the contents from any browser. No internet connection is necessary in order to browse the archive.
Double click to open. (For OS X, open the .dmg file to mount the volume and extract the player). You may have to agree to allow open files from the internet, and to allow making internet connections (windows only). This still new software and other distribution methods may be added in the future.
A file dialog will show up. Browse to an existing WARC or ARC file(s).
You can use https://webrecorder.io to record pages as you browse and then download the WARC file.
A browser will open to http://localhost:8090/ listing all the pages in the archive.
Click on any page listed to view the replay. Or, enter a url to search the full archive.
To exit, simply close the WebArchivePlayer window.
(Replaying screenshot from Wikipedia SOPA Blackout. You can download the WARC from GitHub.)
Currently, executable versions are available only for OS X and Windows.
However, the player should work on any system that has Python 2.7.x, but requires a little bit more setup.
On other systems (or to build from source):
Clone this repo: git clone https://github.com/ikreymer/webarchiveplayer.git; cd webarchiveplayer
Install by running python setup.py install
(optionally using a virtualenv)
Run webarchiveplayer [/path/to/warc_or_arc]
If a W/ARC file argument is omitted, the player will attempt to start in GUI mode and show a File Open dialog.
However, in order to run in GUI mode, the wxPython toolkit will also need to be installed seperately.
Refer to instructions at wxPython Download page for your platform.
wxPython does not by default work in virtualenv. The simplest way to make it work is to symlink the system wxredirect.pth
to the virtualenv site-packages directory. For example, on OS X, if you've installed `virtualenv [myenv]
ln -s /Library/Python/2.7/site-packages/wxredirect.pth [myenv]/lib/python2.7/site-packages/wxredirect.pth
If a W/ARC file argument is passed to the player, eg:
webarchiveplayer /path/to/warcfile.warc.gz
The player will select that file and skip the File Open dialog. Installation of wxPython is not required when specifiyng the WARC explicitly via command line.
The OS X and Windows applications also support specifying the file via command line.
In addition to opening files, WebArchivePlayer can now also be used to provide a point-and-click launcher for any pywb archive.
If a config.yaml
file is present in the working directory (same directory as WebArchivePlayer), the specified configuration will be loaded
instead of a file prompt.
This can be used to distribute specific archives together with WebArchivePlayer.
Certain aspects of the player can also be modified in the config.yaml
, including changing the contents
from 'Web Archive Player' to any custom title and HTML page.
webarchiveplayer:
# initial page to load on start-up
# eg: http://localhost:8090/my_coll/http://example.com/
start_url: my_coll/http://example.com/
# set initial width of player window
width: 400
# set initial height of player window
height: 250
# set window title
title: My Archive
# Load custom contents from local HTML
desc_html: ./desc.html
# Auto-load WARCs from specified directory (supported from 1.4.6)
auto_load_dir: ./warcs/
For example, one could distribute a WARC together with the player and provide a custom setup. This includes automatically indexing WARCs on load to allow quick drop in, or configuring a multi-collection archive.
With version 1.4.6, webarchiveplayer supports indexing WARCs automatically from a designated directory. Archive files are indexed on each load to allow for dropping or updating the files more easily.
To setup, all that's needed is a config.yaml
with the following:
webarchiveplayer:
auto_load_dir: ./warcs
title: 'My Archive'
desc_html: ./desc_page.html
If WebArchivePlayer is placed in the same directory as the config.yaml
and warcs
directory,
the player will automatically load and index all WARC/ARC files found in this directory.
Optionally, the config.yaml
and warcs
may also be placed in an archive
sub-directory.
This allows for an archive to be more easily transported (eg. as a tar-ball or zip file).
The last two params allow for customizing the WebArchivePlayer window.
The title
param specifies the window title, while the desc_html
param specifies
the contents of the WebArchivePlayer window.
The following steps describe creating static archive with preset collections and indexed archive files:
1) Create new directory my_archive
and switch to it.
2) Copy the WebArchivePlayer application to my_archive
2) In my_archive
, run wb-manager init my_coll
3) Run wb-manager add my_coll <path/to/warc>
4) Add config.yaml
in my_archive
, perhaps with
webarchiveplayer:
start_url: my_coll/http://example.com/
title: My Archive Demo
5) Now, when WebArchivePlayer is started in my_archive
, it will use the WARC in my_coll
and load http://localhost:8090/my_coll/http://example.com/
as the starting URL.
6) The my_archive
dir can be distributed as a standlone archive and player.
The binaries can be built by running the builds scripts from the app
directory:
Note: wxPython must be installed for this to work. If running in virtualenv, follow instructions above. The install script will not run if it can't find wxPython
OS X: (output written to osx/webarchiveplayer.dmg
)
cd app
./build-osx.sh
Windows: (output copied to windows\webarchiveplayer.exe
)
cd app
build-windows.bat
Ensure config file as desc HTML are read as utf-8
Update to pywb 0.33.1
Support for auto_load_dir
option in config.yaml
(or archive/config.yaml
) which specifies a directory
from which to automatically load WARCs on startup.
Update to pywb 0.32.1
Support Webrecorder collection WARCs, read pages/bookmarks from all warcinfo
records
Update to pywb 0.30.1 Support reading of WARC files with non-HTTP response records (which are skipped).
Build using Python 3 and pywb 0.30.0, using latest pyinstaller
page detect: re-enable reading pagelist from json-metadata
if present in WARC
Support multiple instances by picking a random port if 8090 is not available Ensure HTML 'resource' records are included in page list Display error dialog before quitting if unable to read and index WARC/ARCs. Switch to pywb 0.11.1, many improvements in indexing and replay
Custom preset archive support with custom config.yaml
Use HTML for main window rendering
Switch to pywb 0.10.9.1 for more rewriting improvements
Update to pywb 0.10.8, rewriting improvements, add pywb version display
Update to pywb 0.10.6, significant replay improvements
Fix issue where page listing only lists pages for one WARC/ARC when multiple are selected. Build scripts check for wxPython installation.
Update to use latest pywb release (0.8.3)
Support opening multiple WARC/ARC files at once. Also fix issue with opening files with spaces in filename.
Initial release.
WebArchivePlayer is a simple wrapper over the pywb web archiving tools using pyinstaller to create a standalone, GUI wrapper. The wxPython toolkit is used to provide the GUI. The wrapper starts a local server which serves content from the selected web archive, using pywb to handle the rest.
Consult the pywb documentation for more info on web archive replay.
Please feel free to open an issue on this page for any problems / questions / concerns regarding this tool. This is a brand new software, so feedback is encouraged.
Another project, which in part inspired WebArchivePlayer, is Mat Kelly's excellent WAIL project, which provides a GUI for different web crawling and replay systems.