Open gwiedeman opened 2 years ago
The Info tab is more for a fixed amount of metadata for an entire archive (eg. data from a warcinfo
that's usually found at the beginning, although no guarantee that there is one, and there can be multiple. With WACZ, hopefully we'll have more defined file-level metadata that can easily be accessible in this way.
But, the metadata
records are similar to response
records in the sense that have a URL and there can be a whole lot of them (some crawlers write a metadata for every response record). I think it could make sense to make a category on the URL Search tab, which filters and lists metadata records. That would require indexing them, which currently isn't done, but is definitely doable.. It would be a category, similar to HTML or 'Audio/Video' for example
Sounds reasonable. Being able to facet using the search dropdown would fulfill our use case. Just getting metadata
records to display like response
records would help a lot, as they are currently not accessible from what I can tell.
WACZ is also interesting, as I suppose this info could go in datapackage.json
. Definitely something we'll consider long term.
WARC files can have metadata records. It seems relatively common for these metadata records to be arbitrary JSON key-value pairs.
As a consumer of WARC files, I would like to view metadata included within a WARC file. I would expect this information to be displayed in replayweb.page's "Info" tab.
Our current use case is to preserve email messages as WARC files for improved encoding support and the possible inclusion of externally hosted resources. We are writing email headers as WARC metadata records. While I wouldn't think this would be a canonical use case for replayweb.page, it may still serve as a helpful example. It seems common enough for key-value metadata to be stored this way, and adding replayweb support would go along way to making metadata records more transparent and useful.
Here is a sample metadata record: