openzim / python-libzim

Libzim binding for Python: read/write ZIM files in Python
https://pypi.org/project/libzim/
GNU General Public License v3.0
67 stars 25 forks source link

Is there any way (efficient or not) to iterate over all entries? #168

Closed etotheipi closed 1 year ago

etotheipi commented 1 year ago

I have been using my IDE to look for any relevant methods, and I see nothing. https://github.com/openzim/python-libzim/issues/94 mentions a Archive.get_item_by_index() but I don't see that, and I don't see any other way to go through the articles. For reference, I have a zim archive containing a few thousand articles and no information about what's in them. I am planning to use AI to process the articles to create a separate index and knowledge tree of what's in the zim file (that's the plan, who knows how well it will work).

kelson42 commented 1 year ago

@rgaudin Iterating through all articles, or all front articles, is a recurring question. Maybe providing an example in the README would really help.

rgaudin commented 1 year ago

@rgaudin Iterating through all articles, or all front articles, is a recurring question. Maybe providing an example in the README would really help.

@kelson42 #94 is an open ticket. You may want to prioritize it 😉

rgaudin commented 1 year ago

Workaround (private, may break with any future release) is to loop from id=0 to Archive().all_entry_count, retrieving with Archive()._get_entry_by_id()