webrecorder / specs

Specifications developed and maintained by the Webrecorder community.
https://specs.webrecorder.net
124 stars 14 forks source link

Should other directories besides archive/, indexes/ be part of the spec? #1

Closed ikreymer closed 3 years ago

ikreymer commented 4 years ago

Or should spec be kept small, and allow for extension as needed?

The initial focus is replay, but perhaps other use cases have other requirements that are shared and could use a standardized approach?

ato commented 4 years ago

Crawl artifacts is one such thing that I think would be very useful to standardize.

Obvious reasons for splitting the spec up:

I don't think any of those need to be worried about just yet. So my suggestion would be to keep it simple and start with one spec. You can always split it up later when things start becoming unwieldy and it'll be more obvious at that time how it should be split up.

ikreymer commented 4 years ago

Yeah,I've been leaning towards optimizing specifically for the functional use case of data that is needed for replay, or perhaps a text index, so that anything that is added to the spec has an implementation that is using that data. I think that will help keep the spec more lean, and I agree that it should be kept in sync with things that are using it.

Eg. if there is a tool that would analyze crawl artifacts in a specific way using WACZ, then it would make sense to add them, but not until then..

ikreymer commented 3 years ago

For now, we have archives/, indexes/ and pages/, all used for replay. The adoption of frictionless data package spec also includes a datapackage.json.

Closing this for now, will open additional issues with any other specific considerations.