openzim / python-scraperlib

Collection of Python code to re-use across Python-based scrapers
GNU General Public License v3.0
20 stars 18 forks source link

Significantly enhance the safety of metadata manipulation #217

Closed benoit74 closed 5 days ago

benoit74 commented 1 week ago

Fix #205

Changes

codecov[bot] commented 1 week ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 100.00%. Comparing base (7d1b958) to head (0e764ed).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## data_src #217 +/- ## ========================================== Coverage 100.00% 100.00% ========================================== Files 38 38 Lines 2224 2278 +54 Branches 426 435 +9 ========================================== + Hits 2224 2278 +54 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

benoit74 commented 1 week ago

I like the approach but I'm not a fan of extending the API with config_any_metadata(). I dont have much argument beside it being another piece of API to maintain and the relative confusion it introduces: one has to dig into docstring to understand what to use.

Indeed, it is "a shame" to have two APIs where one with two arguments would suffice. I already simplified a lot compared to original code I wrote but I have to admit I could go even one step further.

I also wonder if we need all this flexibility: Metadata, list of metadata, dict of key/value pairs ?

I'm a bit uncertain on this. dict of key/value pairs is nice for its brevity ... but it's poorly typed, and the past has shown that typed objects might help. At the same time, here we really have no idea of what these metadata can be, so a dict is probably sufficient and the Metadata object does not bring much clarity.

rgaudin commented 1 week ago

I'm a bit uncertain on this. dict of key/value pairs is nice for its brevity ... but it's poorly typed, and the past has shown that typed objects might help. At the same time, here we really have no idea of what these metadata can be, so a dict is probably sufficient and the Metadata object does not bring much clarity.

Yes, I want to say it's easier to extend later when there's a need than to maintain potential use cases but it's really up to you. It's not not a big deal either way

benoit74 commented 5 days ago

Superseeded by https://github.com/openzim/python-scraperlib/pull/221