zestedesavoir / zmarkdown

Live demo: https://zestedesavoir.github.io/zmarkdown/
MIT License
224 stars 52 forks source link

[remark-images-download] add a user-agent to requests #507

Closed philippemilink closed 4 months ago

philippemilink commented 4 months ago

Hello,

This PR solves a problem observed on zestedesavoir.com: downloading images from upload.wikimedia.org failed with a HTTP 403 error, but not on beta.zestedesavoir.com...

After reading this article and live-patching this package on zestedesavoir.com, I can confirm the lack of user-agent was responsible for these HTTP 403 errors.

It will be hard to test, just believe me that this patch is now used in production. And speaking of test, I don't think we can add a test to cover this case...

I used this article to debug it (the HomeAssistant logo is an image from Wikimedia).

StaloneLab commented 4 months ago

Hi :wave: ,

before merging, you need to build the package using npm run build -- --scope=remark-images-download, as mentioned in the README.md. Before doing that, since you linked the Wikimedia page, could you follow their convention on how to choose the User-Agent:

The generic format is <client name>/<version> (<contact information>) <library/framework name>/<version> [<library name>/<version> ...]. Parts that are not applicable can be omitted.

If you run an automated agent, please consider following the Internet-wide convention of including the string "bot" in the User-Agent string, in any combination of lowercase or uppercase letters. This is recognized by Wikimedia's systems, and used to classify traffic and provide more accurate statistics.

So I guess something like remark-images-download bot/3.0.3 (https://github.com/zestedesavoir/zmarkdown) would do it (adding bot and version). The version might be parsed directly from the package.json file.

Regarding tests, I am a bit surprised we do not tests for headers received by the mock server used for testing. I think it would be nice to have at least something checking for headers, if not on the mock server, maybe on the request sent. This will avoid unexpected header changes when updating our dependencies, or forgetting some of them for the micromark migration.

philippemilink commented 4 months ago

before merging, you need to build the package using npm run build -- --scope=remark-images-download, as mentioned in the README.md.

Well, it is not clear we have to run this command after every code modification in packages. And it feels a bit odd to me to generate a dist file and commit it, IMO this should be done only when preparing the release. Anyway, this PR is not about this, I did it.

So I guess something like remark-images-download bot/3.0.3 (https://github.com/zestedesavoir/zmarkdown) would do it (adding bot and version). The version might be parsed directly from the package.json file.

Done.

Regarding tests, I am a bit surprised we do not tests for headers received by the mock server used for testing. I think it would be nice to have at least something checking for headers, if not on the mock server, maybe on the request sent. This will avoid unexpected header changes when updating our dependencies, or forgetting some of them for the micromark migration.

I changed the mock server in a way it rejects requests without User-Agent header.

StaloneLab commented 4 months ago

it feels a bit odd to me to generate a dist file and commit it, IMO this should be done only when preparing the release.

Maybe it is, but it would be easy to forget some packages when preparing the release, and we cannot simply build everything on every release because sometimes it creates a lot of garbage on other packages.

Anyway, thanks for the PR, I merged it, will release a new version with the changes next week most likely.

StaloneLab commented 4 months ago

Just released a new version with your changes, see remark-images-download@3.0.4 and zmarkdown@11.4.2.