Open veloman-yunkan opened 1 year ago
@mgautierfr Any opinion? I have nothing against.
I'm against that.
libzim doesn't know about metadata. All the api is agnostic to the metadata (you have a getMetadata(std::string name)
but no getTitle()
).
This is something already discussed, but libzim is content agnostic. I consider the metadata described in libzim format more as kiwix specification than zim ones. Simply because zim files without these metadata can correctly being read by libzim (but maybe not by kiwix readers). Checking the metadata is at same level than checking the url inside the articles. It may break how we use zim files but not how we parse them.
I consider the metadata described in libzim format more as kiwix specification than zim ones.
@mgautierfr I understand the way how you make the difference, but this is pretty much a personal POV. Metadata specs are part of the ZIM specification like the rest.
Beside the question of the principle, do you see any bad effect that such a move would generate?
We already had this discussion in https://github.com/openzim/zim-tools/issues/336 which is exactly the root issue leading to zim::Metadata
.
I let you read again my comment (https://github.com/openzim/zim-tools/issues/336#issuecomment-1474875186) you were agree with (to the point you moved the issue from libzim to zim-tools)
@mgautierfr I understand your opinion (about the separation of duties between the libzim and the zim-tools) which is the one we have pursued since a long time. At this stage, I either fully disagree or agree with it. Actually, so far I have been supportive of your POV like you have mentioned (but I disagree - and always have AFAIK - with the assertion that metadata size limits are not part of the ZIM specification).
But I see that this metadata checking is currently under implementation at multiple levels in scrapers and binding and zim-tools. I'm concerned about this redundancy. This new reality is why I have asked "Beside the question of the principle, do you see any bad effect that such a move would generate?" (which is not a question you have already answered in the past AFAIK).
Just to be clear, redundancy-wise (already wrote that somewhere else), the scrapers needs to validate Metadata early to provide direct feedback as most of those important metadata are user-provided.
This means having direct access to checks (could be possible via libzim) but also complete checks.
We are quite happy with scraperlib and I think that in this case it's more practical to have redundancy in scraperlib and JS-binding rather than bringing ISO-639-3 and PNG library to libzim.
The point is sort of the same: if the spec allows setting invalid metadata, it's hard to justify such requirements just to check them. Similar to the separate writer/reader libraries topic.
Just to be clear, redundancy-wise (already wrote that somewhere else), the scrapers needs to validate Metadata early to provide direct feedback as most of those important metadata are user-provided.
This is a good point I have forgotten to say. We might be more interested in the definition of the Metadata constraints in the libzim than in the check themselves. This is at least one of the altnernative I consider.
openzim/zim-tools#339 introduced
zim::Metadata
in zim-tools source tree so that it could be immediately used as a shared utility betweenzimwriterfs
andzimcheck
. Howeverlibzim
is a more logical home forzim::Metadata
. Then (after an easy enhancement ofzim::Metadata
) simple constraints (involving a single metadata item) can be checked inzim::writer::Creator::addMetadata()
and full checks can be run inzim::writer::Creator::finishZimCreation()
.