openzim / overview

:balloon: Start here for current projects, how to get involved, and joining community calls. A resource for new and veteran members of the offline commmunity
2 stars 1 forks source link

Consider saving content publication date in the ZIM #9

Open kelson42 opened 2 years ago

kelson42 commented 2 years ago

And specify it is the ZIM specification.

For the moment, we have only the ZIM creation date, but this might be really different from the content publication date, in particular if the content is really old.

Folllowing a comment from https://github.com/veloman-yunkan at https://github.com/kiwix/libkiwix/issues/702#issuecomment-1030916357

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

kelson42 commented 2 years ago

@Popolechien @rgaudin @mgautierfr @veloman-yunkan I would like to move forward with this. We need to have a way to store the original content date as ZIM Medatada. As a reminder, we already have the « Date » metadat which is automatically set and represents the ZIM creation date, see https://wiki.openzim.org/wiki/Metadata

To me there is only questions about the naming:

IMO, we would benefit of using:

What do you think?

rgaudin commented 2 years ago

Agrees with the proposal although I think explicit suffix would be better: PublishedOn, IssuedOn for instance.

I am not sure about the implications of renaming Date (to PublishedOn if I understood correctly) given it's a mandatory metadata but if it's not an issue, I'd say yes.

mgautierfr commented 2 years ago

Why not Created instead of Date ? For me it represent a bit better that the zim file as been created at this time. I'm not sure about Published and Issued. There is two different date of "publication":

It would be nice to define all those therms now, even if we don't implement them. Appart that, I don't have a strong opinion on which term to use for what. Another potential candidate could be distributed, released or reissued/republished

I am not sure about the implications of renaming Date (to PublishedOn if I understood correctly) given it's a mandatory metadata but if it's not an issue, I'd say yes.

Creator should be updated but there will be no urgency on this as reader will have a lot of "old" zim file to handle anyway. On reader side, Date is handle as a optional metadata, if the date is missing we use a empty (string) value. I don't know how it behaves when we try to sort books by date... We would have to adapt the reader part to search on <newName>else use Date, but it seems as small change.

kelson42 commented 2 years ago

"Date" is not optional says https://wiki.openzim.org/wiki/Metadata, @mgautierfr please open the necessary issues.

I have no strong opinion on "Created" versus "Published". @rgaudin Any opinion?

@mgautierfr Honestly, I believe to have defined "Published" and "Issued" clearly. What is unclear exactly?

mgautierfr commented 2 years ago

"Date" is not optional says https://wiki.openzim.org/wiki/Metadata, @mgautierfr please open the necessary issues.

libzim never enforce the "Mandatory" property on metadata (except for Counter as it is created by libzim). The mandatory property is enforced at scrapper level (zimwriterfs does it, python-libzim and python-scrapperlib doesn't but maybe it is enforced in other tools). And I'm not sure we need to enforce it. What do we (libzim) do if user doesn't provide it ? We may refuse to create the zim file but we can detect this only at the end of the process, do we really want to wait the end of the creation to fail because Date is not provided and "everything" would work if we accept to create it ?

On reader side, we don't expect Date as mandatory. But same question here, do we refuse to read a working zim only because Date (or any metadata) is missing ?

@mgautierfr Honestly, I believe to have defined "Published" and "Issued" clearly. What is unclear exactly?

I agree with you definitions. My point is that there is another notion of date (when the zim itself is publish). This third notion cannot (and will not) be added to zim file but we should integrate this notion in our nomenclature as we may add it in the catalog (where it will have to be consistent with metadata pushed in zim file) This is why I propose Created for the creation date of the zim file and Published for when the zim file as been added to the catalog.

kelson42 commented 2 years ago

I agree with you definitions. My point is that there is another notion of date (when the zim itself is publish). This third notion cannot (and will not) be added to zim file but we should integrate this notion in our nomenclature as we may add it in the catalog (where it will have to be consistent with metadata pushed in zim file) This is why I propose Created for the creation date of the zim file and Published for when the zim file as been added to the catalog.

I don't think we need an additonal one. "Published" is there and can be (and will be) superseeded in the CMS. Once the CMS takes the leadership on metadata, the time of the technical creation of the ZIM does not really matter anymore.

kelson42 commented 2 years ago
stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.