Open rgaudin opened 1 month ago
I would consider to split it in two: config_std_metadata
(to be used by default) and config_extra_metadata
(for those scraper like warc2zim who want to add custom metadata). This seems important to me so that both method can still benefit from same logic (currently we remove control characters for instance, but we might add more logic in the future). I recommend to even force config_extra_metadata
to force the X-
prefix we used in warc2zim for X-ContentDate
, so that we limit even further the risks of strange metadata. WDYT?
And obviously we need to keep config_indexing
till the next major.
I recommend to even force
config_extra_metadata
to force theX-
prefix we used in warc2zim forX-ContentDate
, so that we limit even further the risks of strange metadata. WDYT?
Works for me, as long as there's still the possibility to add non-prefixed metadata (via add_metadata()
).
In https://github.com/kiwix/operations/issues/286 we had two misspelled yet undetected metadata:
tags
andscraper
.I think accepting extra metadata in this method defeats the purpose of having them all exposed. I also think it's use is marginal and that additional metadata can still be added by other means.
@benoit74 Can we get rid of this?