kiwix / kiwix-apple

Kiwix for offline access on iOS and macOS
https://apple.kiwix.org
GNU Lesser General Public License v3.0
438 stars 70 forks source link

Feed fix for language changes #663

Closed BPerlakiH closed 3 months ago

BPerlakiH commented 3 months ago

I did test the following scenario:

We have the following problem, the user's default / selected language will get updated properly, once the catalog is fetched, yet the user will still face an empty list of categories. The reason is the following:

kelson42 commented 3 months ago

the listing of the categories are based on what is stored in the local DB

It's hardcoded? Where does the information comes from as this DB has to be filled at dome point?

BPerlakiH commented 3 months ago

It is not hardcoded. We are talking about a user that upgrades the app. Yes, the DB is filled in with the entries in the current AppStore version of the app. The problem is that we only check for the ZIM file ID if exists or not, and if it already exists in the DB, we won't update it. In this case we should probably update language value of it.

kelson42 commented 3 months ago

Here there is an architectural problem:

I don‘t know how to solve here the problem but ultimatively things shoukd be data driven, see https://libkiwix.readthedocs.io/en/latest/search.html?q=Categories&check_keywords=yes&area=default

For the rest I don‘t understand your issue:

BPerlakiH commented 3 months ago

We have the full meta data of each ZIM file in a local sqlite DataBase stored together with the application. The interesting part for this ticket is the fileID and the languageCode. An upgrading user already has this eg:

quote(ZFILEID) ZNAME ZCATEGORY ZFLAVOR ZLANGUAGECODE
X'746C75B67DAC87D962DBAB5542F6FE7B' Wikivoyage wikivoyage nopic en

now the feed has the same file, with the same ID, but the language is: "eng":

<entry>
    <id>urn:uuid:746c75b6-7dac-87d9-62db-ab5542f6fe7b</id>
    <title>Wikivoyage</title>
    <updated>2024-02-13T00:00:00Z</updated>
    <summary>The collaborative travel guide</summary>
    <language>eng</language>
    <name>wikivoyage_en_all</name>
    <flavour>nopic</flavour>
    <category>wikivoyage</category>
    <tags>wikivoyage;_category:wikivoyage;_pictures:no;_videos:no;_details:yes;_ftindex:yes</tags>
    <articleCount>32417</articleCount>
    <mediaCount>34</mediaCount>
    <link rel="http://opds-spec.org/image/thumbnail"
          href="/catalog/v2/illustration/746c75b6-7dac-87d9-62db-ab5542f6fe7b/?size=48"
          type="image/png;width=48;height=48;scale=1"/>
    <link type="text/html" href="/content/wikivoyage_en_all_nopic_2024-02" />
    <author>
      <name>Wikivoyage</name>
    </author>
    <publisher>
      <name>openZIM</name>
    </publisher>
    <dc:issued>2024-02-13T00:00:00Z</dc:issued>
    <link rel="http://opds-spec.org/acquisition/open-access" type="application/x-zim" href="https://download.kiwix.org/zim/wikivoyage/wikivoyage_en_all_nopic_2024-02.zim.meta4" length="254736384" />
  </entry>

The problem is that after downloading and parsing the feed, it finds the entry by ID, and skips it, since we already have that in the DB, and the language field won't get updated.

When we want to display the categories, it searches by the current language using a query more or less:

SELECT ... WHERE  ZLANGUAGECODE IN  ["eng"]

and nothing is found.

kelson42 commented 3 months ago

So this is the category listing applying to the local library?

In general, the problem is worse than it looks like. The underlying question is: Once we have downloaded a ZIM, should we display the metadata linked to the ZIM coming from the feed, or the one saved at the time the book has been introduced in the local library?

But I guess ultimatively, this is more a question for the libkiwix. @mgautierfr How does the libkiwix behaves on this?

@BPerlakiH For the monent, I recommend to introduce a temporary fix which update the lang in local library when necessary. This fix to be removed (put a comment on code and create dedicated issue) in a few releases.

BPerlakiH commented 3 months ago

I have done a fix for this, which works for both new users = fresh installing the latest version of the app, and for updating users = those who have the AppStore version already

So the problem also occurred, because the application was relying on the fact that the ZIM fileID will ultimately identify the file, whereas the feed content changed at some point, and now the feed is using alpha-3 language codes. The other question is do we know why/when the feed was updated? We need to be more cautious about similar changes, as they can break the apps that are already in the AppStore. Another thing is: if we change the feed in a similar fashion for the other fields, we might bump into a similar problem again... One posible solution for avoiding these cases is a versioned API eg.: ...content/v1/feed and ...content/v2/feed, but that also comes with additional maintenance overhead on the back-end side.

BPerlakiH commented 3 months ago

We have about 25 fields for each ZIM file in the DB eg:

Z_PK Z_ENT Z_OPT ZARTICLECOUNT ZHASDETAILS ZHASPICTURES ZHASVIDEOS ZINCLUDEDINSEARCH ZISMISSING ZMEDIACOUNT ZREQUIRESSERVICEWORKERS ZSIZE ZDOWNLOADTASK ZCREATED ZCATEGORY ZFILEDESCRIPTION ZFLAVOR ZLANGUAGECODE ZNAME ZPERSISTENTID ZDOWNLOADURL ZFAVICONURL ZFILEID ZFAVICONDATA ZFILEURLBOOKMARK
1 4 1 8 1 1 1 1 0 24 0 313540608 0 2021-01-19 00:00:00 +0000 ted Designer Isaac Mizrahi dashed off these stylish, breezy notes on why these 6 TEDTalks are a source of inspiration for him. eng Isaac Mizrahi: Talks that are in fashion ted_en_playlist-isaac-mizrahi-talks-that-are https://download.kiwix.org/zim/ted/ted_en_playlist-isaac-mizrahi-talks-that-are_2021-01.zim.meta4 https://library.kiwix.org/catalog/v2/illustration/80c9f981-09ea-69a9-a87e-10aa8283ba05/?size=48
kelson42 commented 3 months ago

@BPerlakiH What id the ZIM FileID? The only ZIM id I know is the uuid metadata and only this should be used https://wiki.openzim.org/wiki/ZIM_file_format#Header. This one never changes and shoukd be used to identify a specific published ZIM file.

kelson42 commented 3 months ago

After disscussion, the solution should be here to run once (only after update) a special function to remove old online (not local) entries of the DB, so at next sync, then everything will be repopulated. Of course once the DB is cleared from online entries, something should be done to redownload the online feed ASAP.

mgautierfr commented 3 months ago

In general, the problem is worse than it looks like. The underlying question is: Once we have downloaded a ZIM, should we display the metadata linked to the ZIM coming from the feed, or the one saved at the time the book has been introduced in the local library?

But I guess ultimatively, this is more a question for the libkiwix. @mgautierfr How does the libkiwix behaves on this?

I would say that libkiwix doesn't behaves at all on this. The kiwix::Library is either feed with opds stream or library.xml[*] (saved library). So metadata come from the input. And on an existing (potentially empty) Library you can also add a book (and so metadata come from the book).

At the end, libkiwix doesn't choose. It is an application decision.


[*] When loading a library.xml, we have a boolean trustLibrary (true by default). When it is true, we trust the library metadata. If false, we open the books to get the metadata from them. (This can be pretty low, so it is false by default to allow quick startup of kiwix-serve (and not open hundreds of zim files for "nothing")

BPerlakiH commented 3 months ago

Fix is included in: https://github.com/kiwix/apple/issues/668