webrecorder / warcio.js

JS Streaming WARC IO optimized for Browser and Node
MIT License
30 stars 6 forks source link

Make POST/non-GET URL canonicalization consistent with pywb #58

Open tw4l opened 1 year ago

tw4l commented 1 year ago

Related to https://github.com/webrecorder/specs/issues/141

warcio.js and pywb have slightly different behavior in terms of how keys are handled.

For example for an input {"a": [[], {}, true, false, null, "", " ", 1, 1.0, -0.0]}, we get the following results in warcio vs pywb:

warcio's jsonToQueryString

2=true&3=false&5=&6=+&7=1&8=1&9=0

pywb

a=True&a.2_=False&a.3_=None&a.4_=&a.5_=+&a.6_=1&a.7_=1.0&a.8_=-0.0

There are two key differences:

  1. The a key is getting lost by warcio.js due to our current implementation of a replacer function passed to JSON.stringify. This needs to be addressed to consider full key paths.
  2. pywb is returning Pythonic values, which will be addressed in https://github.com/webrecorder/pywb/issues/859