internetarchive / warcprox

WARC writing MITM HTTP/S proxy
378 stars 54 forks source link

change trough dedup `date` type to varchar #144

Closed nlevitt closed 4 years ago

nlevitt commented 4 years ago

This is a backwards-compatible change whose purpose is to clarify the existing usage.

In sqlite (and therefore trough), the datatypes of columns are just suggestions. In fact the values can have any type. See https://sqlite.org/datatype3.html. datetime isn't even a real sqlite type.

Warcprox stores a string formatted like '2019-11-19T01:23:45Z' in that field. When it pulls it out of the database and writes a revisit record, it sticks the raw value in the WARC-Date header of that record. Warcprox never parses the string value.

Since we use the raw textual value of the field, it makes sense to use a textual datatype to store it.