webrecorder / pywb

Core Python Web Archiving Toolkit for replay and recording of web archives
https://pypi.python.org/pypi/pywb
GNU General Public License v3.0
1.39k stars 216 forks source link

File formats #283

Closed Minyar2004 closed 6 years ago

Minyar2004 commented 6 years ago

Hi,

I would like to use your module pywb.

Is it able to read formats other than ARC and WARC?

Thank you in advance

ikreymer commented 6 years ago

pywb is designed to work with web archive files, ARC and WARC. If you want to use HAR (Http Archive), we have https://github.com/webrecorder/har2warc that can be used separately to create WARCs from HAR.

I'm curious about what format you are interested in using?

Minyar2004 commented 6 years ago

Thanks for your reply !

I would like to know if it is possible to use pywb in a way that it does not communicate directly with the archive file but with a web service which communicate with the archive (independently of the archive format). It will be like the module pywb-ia that communicate with the Internet Archive web archives.

Thanks !

ikreymer commented 6 years ago

Yes, this is mostly possible, using remote memento index source. You can add the following to your config:

collections:
    web: memento+https://web.archive.org/web/

That will point /web to the remote collection.

See: http://pywb.readthedocs.io/en/latest/manual/configuring.html#remote-memento-collection for more info and other options.

This can work with any other archive that supports either Memento or CDX Server APIs

Minyar2004 commented 6 years ago

Yes

That's exactly what I did

Thanks