xwmx / nb

CLI and local web plain text note‑taking, bookmarking, and archiving with linking, tagging, filtering, search, Git versioning & syncing, Pandoc conversion, + more, in a single portable script.
https://xwmx.github.io/nb
GNU Affero General Public License v3.0
6.64k stars 188 forks source link

`Accept-Encoding: *` can suppress bookmark content downloads #264

Closed davideaster closed 1 year ago

davideaster commented 1 year ago

After #246, servers may send a response compressed using an algorithm that curl doesn't understand. curl saves the result as the compressed blob in this case, causing nb to discard it as unrecognized.

Using --compressed in place of --header "Accept-Encoding: *" may be a better option. With --compressed, curl still sends Accept-Encoding, but it only lists the encodings that curl recognizes.

For wget, --compression=auto is a similar option.

curl

Without Accept-Encoding

$ curl -sSLv https://hyperscript.org/ |& egrep -i 'encoding:|content-length:'

< content-length: 9287

$ curl -sSL https://hyperscript.org/ | wc -c

    9287

$ curl -sSL https://hyperscript.org/ | file -

/dev/stdin: HTML document text, UTF-8 Unicode text

With --header 'Accept-Encoding: *'

$ curl -sSLv --header 'Accept-Encoding: *' https://hyperscript.org/ |& egrep -i 'encoding:|content-length:'

* h2 [accept-encoding: *]
> Accept-Encoding: *
< content-encoding: br
< content-length: 3133

$ curl -sSL --header 'Accept-Encoding: *' https://hyperscript.org/ | wc -c

    3133

$ curl -sSL --header 'Accept-Encoding: *' https://hyperscript.org/ | file -

/dev/stdin: data

With --compressed

$ curl -sSLv --compressed https://hyperscript.org/ |& egrep -i 'encoding:|content-length:'

* h2 [accept-encoding: deflate, gzip]
> Accept-Encoding: deflate, gzip
< content-encoding: gzip
< content-length: 3446

$ curl -sSL --compressed https://hyperscript.org/ | wc -c

    9287

$ curl -sSL --compressed https://hyperscript.org/ | file -

/dev/stdin: HTML document text, UTF-8 Unicode text

wget

Without Accept-Encoding

$ wget -d https://hyperscript.org/ -O - |& egrep -i 'begin---|encoding:|content-length:'

---request begin---
Accept-Encoding: identity
---response begin---
Content-Length: 9287

$ wget -q https://hyperscript.org/ -O - | wc -c

    9287

$ wget -q https://hyperscript.org/ -O - | file -

/dev/stdin: HTML document text, UTF-8 Unicode text

With --header='Accept-Encoding: *'

$ wget -d --header='Accept-Encoding: *' https://hyperscript.org/ -O - |& egrep -i 'begin---|encoding:|content-length:'

Setting --header (header) to Accept-Encoding: *
---request begin---
Accept-Encoding: *
---response begin---
Content-Encoding: br
Content-Length: 3127

$ wget -q --header='Accept-Encoding: *' https://hyperscript.org/ -O - | wc -c

    3127

$ wget -q --header='Accept-Encoding: *' https://hyperscript.org/ -O - | file -

/dev/stdin: data

With --compression=auto

$ wget -d --compression=auto https://hyperscript.org/ -O - |& egrep -i 'begin---|encoding:|content-length:'

---request begin---
Accept-Encoding: gzip
---response begin---
Content-Encoding: gzip
Content-Length: 3446

$ wget -q --compression=auto https://hyperscript.org/ -O - | wc -c

    9287

$ wget -q --compression=auto https://hyperscript.org/ -O - | file -

/dev/stdin: HTML document text, UTF-8 Unicode text
xwmx commented 1 year ago

Thanks for the detailed information. I've updated the commands to use --compressed / --compression=auto. This is available as of version 7.5.6. Let me know if you run into any issues with it. Thanks again!

davideaster commented 1 year ago

I'm glad I could help. I'm enjoying learning to use nb.