openzim / python-libzim

Libzim binding for Python: read/write ZIM files in Python
https://pypi.org/project/libzim/
GNU General Public License v3.0
62 stars 20 forks source link

Duplicate article's title is indexed #73

Closed rgaudin closed 2 years ago

rgaudin commented 4 years ago

When trying to add an article with the same URL of an existing article, the libzim issues a warning and doesn't add the article. Apparently, when doing so, the duplicate article's title is still added to the suggestion index.

Might be a libzim issue though…

def test_title(reader, title):
    nb = reader.get_suggestions_results_count(title)
    res = list(reader.suggest(title))
    print(title, "--", nb, len(res), res)

fpath = pathlib.Path("test.zim")

with Creator(fpath, "welcome", "fra") as creator:
    creator.add_article("welcome", title="Home", content="hello")
    creator.add_article("welcome", title="Maison", content="bonjour")

with libzim.reader.File(fpath) as reader:
    print("nb article", reader.article_count)
    test_title(reader, "Home")
    test_title(reader, "Maison")
    print(reader.get_article("A/welcome"))
Impossible to add A/welcome
  dirent's title to add is : Maison
  existing dirent's title is : Home
T:0; A:5; RA:0; CA:5; UA:0; FA:0; IA:2; C:0; CC:0; UC:0; WC:2
T:0; Waiting for workers
T:0; ResolveRedirectIndexes
Resolve redirect
T:0; Set article indexes
set index
T:0; Resolve mimetype
T:0; create title index
T:0; 6 title index created
T:0; 2 clusters created
T:0; write zimfile :
T:0;  write mimetype list
T:0;  write directory entries
T:0;  write url prt list
T:0;  write title index
T:0;  write cluster offset list
T:0;  write header
T:0;  write checksum
T:0; rename tmpfile to final one.
T:0; finish
nb article 6
Home -- 2 1 ['A/welcome']
Maison -- 2 1 ['A/welcome']
ReadArticle(url=A/welcome, title=Home)
mgautierfr commented 4 years ago

Somehow a duplicated of https://github.com/openzim/libzim/issues/240 but maybe easier to fix.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

kelson42 commented 2 years ago

@rgaudin @mgautierfr Might that be that fixing https://github.com/openzim/libzim/issues/688 has fixed that one as well?

mgautierfr commented 2 years ago

Yes, now libzim should raise a exception in this case (and python-libzim should (to do ?) catch it correctly and report it in the python word)