openzim / python-scraperlib

Collection of Python code to re-use across Python-based scrapers
GNU General Public License v3.0
17 stars 16 forks source link

Log validated metadata. #154

Closed richterdavid closed 1 month ago

richterdavid commented 2 months ago

Add logging for validated metadata per discussion on https://github.com/openzim/warc2zim/issues/123 and https://github.com/openzim/warc2zim/pull/233

richterdavid commented 2 months ago

This is probably fine but I haven't run local tests yet. The instructions I found for testing are problematic. https://github.com/openzim/python-scraperlib/issues/153

benoit74 commented 2 months ago

See #155 ; I don't think you've implemented this in the right place, but let's wait a bit for discussions to settle

rgaudin commented 2 months ago
richterdavid commented 2 months ago

won't work. this custom logger is not configured to display debug why creating a logger from scratch?

With respect to accessing and configuring the logger, should everything in scraperlib be using this? https://github.com/openzim/python-scraperlib/blob/7d498319baadba715316c15cf9857ff2f6974a00/src/zimscraperlib/__init__.py#L11

It's evident that folks want unvalidated metadata logged, and presumably referenced by validation failures later. Consider: "Validation of Scrape URL failed" vs "Validation of something failed". Even better f"Validation of Scrape URL failed: {value}" so you don't need to look back to the earlier logging of the relevant metadata.

how will this be toggled?

Ideas?

Do we want this key-value log without context?

Context makes sense in the new proposal to log all metadata early in start(). Given how metadata is added incrementally in this class it didn't make much sense here.

Do we want to print validated metadata only?

Current proposals say no.

Okay, I'm not going to work on this branch further; once there's a plan I'll send a new PR for issue #155