typesense / typesense-docsearch-scraper

A fork of Algolia's awesome DocSearch Scraper, customized to index data in Typesense (an open source alternative to Algolia)
https://typesense.org/docs/guide/docsearch.html
Other
95 stars 35 forks source link

[Feature request] Allow support for a less verbose option #52

Open imballinst opened 8 months ago

imballinst commented 8 months ago

Description

Hello! I would like to have a feature where the scraper outputs less verbose thing. According to https://typesense.org/docs/guide/docsearch.html#run-the-scraper:

If needed, you can send the output to both stdout and a file at the same time by adding | tee scraper-output.txt to the end of the command. This is helpful because the output can be very verbose.

While using the same approach I can also make it output to a text, using, say, > output.txt, it will also silence the error (if any). Would it be possible that we can have a LOG_LEVEL or something as environment variable when running the Docker scraper?

Steps to reproduce

  1. Just run the scraper, the output is very verbose, as documented in docs

Expected Behavior

We have something like LOG_LEVEL that allows us to filter logs, so that in CI we can get only, say, error logs and not the "debug" logs.

Actual Behavior

We can't (?) configure log levels, which causes the logs to be rather overwhelming

Metadata

Typesense Version: (not related to Typesense version) OS: (not related to OS version)

imballinst commented 7 months ago

Nevermind, I'm dumb 🤦 I didn't realize that there's this --log-level option when running the Docker container.

docker run --log-level error typesense/docsearch-scraper:0.9.1

It will now only log the "> DocSearch:" stuff, resulting in less verbosity. Could we have this added in the Docs as well? I think this part can be improved:

If needed, you can send the output to both stdout and a file at the same time by adding | tee scraper-output.txt to the end of the command. This is helpful because the output can be very verbose.

We can add something like:

You can reduce the verbosity of the output by passing the --log-level option (from docker run) to "error", for example, docker run --log-level error typesense/docsearch-scraper:0.9.1.

EDIT: 🤦 (2nd time) I think I was confusing the log level between dockerd and docker run 😬 so this post is possibly invalid