webrecorder / warcio.js

JS Streaming WARC IO optimized for Browser and Node
MIT License
30 stars 6 forks source link

Multiple WARC-Concurrent-To fields #32

Open mattfysh opened 2 years ago

mattfysh commented 2 years ago

Hey Ilya - in the spec a record can have multiple WARC-Concurrent-To fields, e.g.

WARC-Record-ID: <urn:uuid:276ff7fe-efd8-4dfa-972e-606fee81feb7>
WARC-Concurrent-To: <urn:uuid:f22d1d8b-fcf6-4836-9959-7e91c8a2380d>
WARC-Concurrent-To: <urn:uuid:57b684e2-e813-437f-a99f-bf8c31cdb258>

As an exception to the general rule, several WARC-Concurrent-To fields may be repeated within the same WARC record.

https://iipc.github.io/warc-specifications/specifications/warc-format/warc-1.1/#warc-concurrent-to

However the parser is only allowing access to one of them. Is it possible to get record.warcHeader('WARC-Concurrent-To') to return an array of values?

mattfysh commented 2 years ago

As a workaround I'm using:

class MultiValueMap extends Map {
  set(key, value) {
    let finalValue = value
    if (key === 'WARC-Concurrent-To') {
      const prev = this.get(key) || []
      finalValue = [...prev, value]
    }
    super.set(key, finalValue)
  }
}

const parser = new WARCParser(...)
parser._headersClass = MultiValueMap