openzim / gutenberg

Scraper for downloading the entire ebooks repository of project Gutenberg
https://download.kiwix.org/zim/gutenberg
GNU General Public License v3.0
126 stars 37 forks source link

Gutenberg logs are all under the same `name` #206

Open benoit74 opened 10 months ago

benoit74 commented 10 months ago

In Gutenberg logs, only one logger name (gutenberg2zim.constants) is used making it pretty useless.

[gutenberg2zim.constants::2023-08-19 11:40:30,563] INFO:    Parsing file cache/epub/99/pg99.rdf for book id 99
[gutenberg2zim.constants::2023-08-19 11:40:31,442] INFO:    Parsing file cache/epub/9/pg9.rdf for book id 9
[gutenberg2zim.constants::2023-08-19 11:40:32,515] INFO:Add possible url to db
[gutenberg2zim.constants::2023-08-19 11:40:32,517] DEBUG:bash -c rsync -a --list-only rsync://aleph.pglaf.org/gutenberg/ > tmp/file_on_aleph_pglaf_org

We should not log the name anymore and instead log the filename with %(filename)s or module with %(module)s

rgaudin commented 10 months ago

We've found that a single name is enough in most scrapers so we use the name to distinguish our logs from the other dependencies. Here it should use gutenberg2zim instead of the module name. We could use different name base on file or module but it brings little value and make the logs very difficult to read because lines are not aligned (prefix size changes)

elfkuzco commented 3 months ago

@benoit74 , I would like to implement this. Should I stick with keeping the module names or just use gutenberg2zim as @rgaudin suggested?

benoit74 commented 3 months ago

Just use one name, gutenber2zim as suggested by @rgaudin

And please adapt the code to create the logger with scraperlib getLogger function like we try to harmonize among our codebase.

One good example of this approach is in offspot/demo:

Thank you!