openzim / libzim

Reference implementation of the ZIM specification
https://download.openzim.org/release/libzim/
GNU General Public License v2.0
164 stars 50 forks source link

Libzim should support alias creation. #824

Closed mgautierfr closed 9 months ago

mgautierfr commented 11 months ago

For https://github.com/kiwix/overview/issues/95

We need to be able to create entries pointing to the same data.

kelson42 commented 10 months ago

@rgaudin @mgautierfr We are agree on the principle of implementing this. The sooner its done the better. It seems to me the fix in the creator should be pretty simple.

Still I have a doubt about the naming of "alias", because "alias" implies it is a name with less value than the original one. Here this would not be the case. Maybe if we can avoid the usage of this terminology it would be better. Otherwise "hardlink", like on the filesystem, seems to me to be better naming.

Probably the specification needs to be updated as well to clearly explain this is possible (even if it was not written is wasn't so far).

rgaudin commented 10 months ago

I think hardlink carries more unwanted expectations with regards to modifications and de-duplication. Anyway, it's a libzim feature that will be documented in spec and in the docstrings of the lib(s) so both alias, hardlink or whatever @mgautierfr might find more appropriate while implementing should be fine.

rgaudin commented 10 months ago

Should we already open a ticket on zimcheck to remind us that zimcheck will need to be updated? Actually, it opens some questions like: will libzim automatically remap entries with duplicate content as aliases?

mgautierfr commented 10 months ago

I agree with the naming question. I was asking myself if alias was a good name. I was thinking of hardlink too. I have no final answer for now.

Should we already open a ticket on zimcheck to remind us that zimcheck will need to be updated?

Done : https://github.com/openzim/zim-tools/issues/377

Actually, it opens some questions like: will libzim automatically remap entries with duplicate content as aliases?

It is not planned. It would means we store (at least) the hash of all added content and do a hash computation and check before adding new content. We could discuss that but I don't think we should do it in the same time than zimit 2.0

rgaudin commented 10 months ago

We could discuss that but I don't think we should do it in the same time than zimit 2.0

I agree, not important for now.