meilisearch / scrapix

MIT License
21 stars 9 forks source link

`user_agents` in configuration file doesn't change HTTP User-Agent header #94

Open TonyRL opened 6 months ago

TonyRL commented 6 months ago

Steps to reproduce

  1. Run the latest meilisearch image from docker.
  2. Place a reverse proxy, says Caddy, before the meilisearch container with the configuration:
    :7701 {
    reverse_proxy localhost:7700
    log {
        output stdout
    }
    }
  3. Update scrapix configuration file misc/config_examples/docusaurus-docsearch.json of meilisearch/scrapix to include a custom user-agent. "user_agents": ["foo"]
  4. Run yarn playground:docsearch from meilisearch/scrapix.
  5. Observe the log output of Caddy.
  6. Observe the log output of meilisearch. docker logs -f meilisearch

Expected behavior

  1. The HTTP "User-Agent" header from Caddy's log should be something similar to what the docs mentioned:

    user_agents An array of user agents that are append at the end of the current user agents. In this case, if your user_agents value is ['My Thing (vx.x.x)'] the final user_agent becomes

    Meilisearch JS (vx.x.x); Meilisearch Crawler (vx.x.x); My Thing (vx.x.x)
  2. The HTTP "User-Agent" header from meilisearch's log should be something similar to the above mentioned value.

Actual behavior

Caddy's log returns node as HTTP User-Agent.

INFO    http.log.access.log0    handled request {"request": {"remote_ip": "10.0.5.2", "remote_port": "33130", "client_ip": "10.0.5.2", "proto": "HTTP/1.1", "method": "POST", "host": "10.0.5.2:7701", "uri": "/indexes", "headers": {"Accept-Language": ["*"], "Sec-Fetch-Mode": ["cors"], "User-Agent": ["node"], "Accept-Encoding": ["gzip, deflate"], "Authorization": [], "Content-Type": ["application/json"], "X-Meilisearch-Client": ["Meilisearch Crawler (v0.1.7) ; foo ; Meilisearch JavaScript (v0.31.1)"], "Accept": ["*/*"], "Content-Length": ["51"], ...}}

meilisearch's log returns node as HTTP User-Agent.

INFO  actix_web::middleware::logger] 172.17.0.1 "PATCH /indexes/docusaurus-docsearch_crawler_tmp/settings HTTP/1.1" 202 140 "-" "node" 0.001615