openzim / python-libzim

Libzim binding for Python: read/write ZIM files in Python
https://pypi.org/project/libzim/
GNU General Public License v3.0
62 stars 20 forks source link

wildcard search or an iterator over all pages #81

Closed amirouche closed 3 years ago

amirouche commented 3 years ago

More on more people are asking how to process wikipedia. It seems to me .zim files are the best way to go.

It missing a way to go through the whole zim file. What I would expected is something along the lines of the following:

for page in File("my.zim"):
    print(page.title)

Is it possible?

kelson42 commented 3 years ago

Not python-libzim, but you should be able to do that with the zimdump tool.

rgaudin commented 3 years ago

It's there and easily guessable from the dosctring.

with File("my.zim") as reader:
    for article_id in range(0, reader.article_count):
        page = reader.get_article_by_id(article_id)