Open machawk1 opened 4 years ago
Yes, since WASAPI is a data transfer API, while WACZ is designed to be a storage specification, there isn't any overlap, but they could definitely complement one another!
I think a main limitation is of WASAPI is that it allows you to download a bunch of WARCs in bulk, but then what do you do with them? A tool could use WASAPI to download WARCs in bulk and then assemble them into a WACZ file, which could be a stable format that could then be instantly usable in replayweb.page or added to other storage. I believe WASAPI is also missing support for any metadata, such as page/seed lists, which would probably also need to be added.
Some of the collection-based retrieval aspects of this specification are particularly interesting, like the ability to specify specific pageIDs of interest.
As you are very well aware, @ikreymer, WASAPI is an abstracted spec for WARC retrieval with a few specifications. I can imagine a WACZ layer to make WASAPI implementations a bit more usable from both a macro and collection-based querying standpoint, as it seems to provide some standard semantics.
Because you have solicited thoughts in this repo, I wondered about consideration of interfacing with WASAPI and/or potentially providing endpoints or routes that align with WACZ.
I am looking forward to further discussion.