Open kelson42 opened 1 year ago
@mgautierfr @veloman-yunkan @juuz0 Does that make sense? Is the libkiwix ready for that?
We don't download more book details than what we can display on the screen (but we need to know how many results we have in total). We should use result pagination.
One interesting problem here is to handle the sorting of books. Current kiwix-desktop applies sort on all of the list. But obviously, if we use pagination, it's only going to apply on the list available to it and give false positives (eg: sorting in descending order by size will not give the book with biggest size in Global catalog). libkiwix should probably support a parameter to give out sorted data, if we need this.
libkiwix should probably support a parameter to give out sorted data, if we need this.
Yes, we have multiple tickets regarding sorting of results. But we could handle this with https://github.com/kiwix/libkiwix/issues/702. Please put the requirements you have in a comment so you can secure Kiwix-Desktop features can continue to work.
does the cache here have to be implemented in libkiwix? QT has QNetworkDiskCache which can help in this
@mgautierfr We urgently need your feedback here
Free text search is handled like any other filter and generates a new API request if changing
I am somehow against that. By definition a free text filtering will only reduce the number of books and so we could do filtering locally. And so avoid data transfer between the client and the server. That why we implement it this way.
BUUUTTT if we go in the direction of using partial entries and pagination, we cannot do the filtering locally as we don't have the list of all books (and their details). So ... yes, by implementation need, all search/filtering/sorting have to be done on the server.
We don't download more book details than what we can display on the screen (but we need to know how many results we have in total). We should use result pagination.
Agree
Libkiwix should cache properly things We should have a cache of requests/response that if we have twice the same requests we identify and avoid downloading twice the results
Not so easy. And maybe not so useful. First of all, catalog content may changed. So we have to invalidate cache somehow. We should measure real use case. How much (duplicated) requests is made by real user (not us testing the catalog) ? And if we go using partial entries, even if we have duplicated requests, we may not need to spend our time on caching ALL entries.
Results list should be handled differently from books. Results reonses should only have the list of book id and book details should be requested (and cached) separatly. That way we can cache book details independently from the API request/response (and save bandwidth).
That's the purpose of partial_entries (to get list of books only) and entry (to get information about a specific book)
As suggested somewhere (I cannot find where), I think we should introduce a "RemoteLibrary" in libkiwix.
This remote library should share the same (subset) api of Library
:
However, the RemoteLibrary would do request to the server for each methods (we almost already have a specific endpoint for each method)
Request and caching strategy:
We will use partial_entries
endpoint to get list of books (getBooksIds
and filter_sorted_paginated
).
Theses methods already return bookId only (as partial_entries
endpoint, so we are good).
When user call getBookById
with a bookId, we do a request to entry
endpoint and get the book information and create a Book
. The Book
itself is cached and so, no duplicate request to entry
is made.
About Illustration:
OPDS already give a url to the illustration instead of the data (already to reduce the size of the opds stream).
Illustration
class itself is downloading the data when we call Illustration::getData()
. This is somehow duplicated with ThumbnailDownloader
implemented in kiwix-desktop by @juuz0 (sorry, I missed that)
This lead to download management:
The problem here is that don't want the download to freeze the ui.
url
being the url to request (by threaded requests). callback
being a function to call with the response content and which actually return the "useful results".
A blocking application would use :
auto tuple = library.getBooksIds();
auto bookIds = tuple.1(download(tuple.0));
do_stuff_with_bookids(bookIds);
A event driven application would use:
auto tuple = library.getBooksIds();
auto request = Request(tuple.0);
request.on_completion([](response) {
auto bookIds = tuple.1(response.content);
do_stuff_with_bookids(bookIds);
});
request.do();
do_other_things_without_waiting_request_to_finish();
I would go with the easy solution and if not enough, move to the difficult solution. (The "Easy remote library" becoming a wrapper around the "difficult remote library" with the "blocking application" code)
I have played a bit with the api:
(All done without filter)
For the moment, a new request of type "https://library.kiwix.org:443/catalog/search?lang=amh&count=-1&tag=_category:gutenberg¬ag=_sw:yes" is done each time a filter is changed in the sidebar. Then everything (book details, no pagination) is downloaded from https://library.kiwix.org. The "search" (currently library top bar) is then applied a posteriori on the local result list to have a match.
I'm not satified because that generates a lot of data transfered (slow and expensive) between local computer and library.kiwix.org.
I woud like to propose: