webrecorder / warcio.js

JS Streaming WARC IO optimized for Browser and Node
MIT License
30 stars 6 forks source link

URL Parameter Key with Empty Value Not Escaped Due to Bug in getSurt #70

Open ARiedijk opened 3 weeks ago

ARiedijk commented 3 weeks ago

given : const testUrl = "https://test.org/s?test^2="; when: const surt = getSurt(testUrl); then surt should be:

org,test)/s?test%5E2

But the current result is: org,test)/s?test%5E2=

The '=' at the end should be removed. I believe the following part of the code is responsible for removing the '=' when a URL parameter key has an empty value.`

        let rx = new RegExp(`(?<=[&?])${rxEscape(key)}=(?=&|$)`);
        if (!rx.exec(urlLower)) {              
          surt = surt.replace(rx, key);
        }

` To make it work, you need to change it to:

`

        let rx = new RegExp(`(?<=[&?])${rxEscape(encodeURIComponent(key))}=(?=&|$)`);
        if (rx.exec(surt)) {              
          surt = surt.replace(rx, encodeURIComponent(key));
        }

`

  1. make sure the key is encoded: rxEscape(encodeURIComponent(key)
  2. We need to check the surt using the rx regex and not the urlLower
  3. if ( ! rx.exec(surt) must be if (rx.exec(surt) -> remove the !
  4. The actual removal of the key + '=' into key must be encoded again -> encodeURIComponent(key)

surt-example.zip