kiwix / libkiwix

Common code base for all Kiwix ports
https://download.kiwix.org/release/libkiwix/
GNU General Public License v3.0
118 stars 55 forks source link

Develop snippet API and use AJAX in fulltext search result page #395

Open kelson42 opened 4 years ago

kelson42 commented 4 years ago

With kiwix/kiwix-tools#345, we have seen that a fulltext search could be really slow on a RPI. In an attempt to split such a big request taking 40s in smaller one, we would like to see the snippet retrieval in the fulltext result page done asynchronously in AJAX.

If we implement kiwix/kiwix-tools#97 this should be more easy

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

automactic commented 3 years ago

It is not really about ajax which I believe you need to load the article into a webview first. It is about be able to have a c++ function to retrieve xapian snippet so it can be done concurrently on multi cores.

It's not super important, feel free to close it as won't do.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

kelson42 commented 3 years ago

@automactic Agree. There is:

@maneeshpm Does that make sense to you?

maneeshpm commented 3 years ago

@kelson42 Yes this sounds good and I agree that concurrently retrieving the snippets will definitely speed it up, but still we will be limited by the actual bottleneck that is snippet generation by Xapian::MSet::snippet(). In this, we currently have to read the data of the entry from the archive, parse it, and then pass it to the above method which finally generates a snippet. All of this happens in our SearchIterator::getSnippet().

To address this, I remember @mgautierfr mentioned in some comment to build the snippets(so that it is just the first few lines of the article) during index time and store them as a blob in our compressed zim format. This way we can directly read them without incurring any additional io/parsing cost. I was more inclined towards this. What do you think?

kelson42 commented 3 years ago

@maneeshpm A decision has been made years ago to not store snippets in the Xapian index. There is no plan, on my side at least, to come back on this. Given this information, does this ticket still depend on an other one? If "yes", which one?

maneeshpm commented 3 years ago

@kelson42 Aah my bad, I was talking about #148 but we have changed its objective now so doesn't concern this ticket. I don't think there are any other dependencies.

mgautierfr commented 3 years ago

It is not really about ajax which I believe you need to load the article into a webview first. It is about be able to have a c++ function to retrieve xapian snippet so it can be done concurrently on multi cores.

This is somehow already the case on C++ side now. If we have a endpoint to ask for the snippet of a particular search result (and we would need a full rest API to do a search, list results, get individual information about results, keep the search "alive" during request, ....), then handling the request would be naturally made on different thread by the server. We would have to protect a bit the snippet generation against multithreading race condition but we could do it almost multithreaded (Getting the article content can be made multithreaded, generating the snippet is done by xapian, and the rules about this is that we cannot do it multithreaded. But, as it is somehow readonly operation maybe it is ok, to check with xapian team).

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

maneeshpm commented 3 years ago

(and we would need a full rest API to do a search, list results, get individual information about results, keep the search "alive" during request, ....)

@kelson42 IMO having a dedicated API endpoint for snippets should be implemented after implementing caching in #509. Since the snippet generation is done by MSet which we get from Enquire, caching of a search instance is necessary to avoid double work.

kelson42 commented 3 years ago

@maneeshpm Agree and sorry for not having seen this obvious thing earlier.

kelson42 commented 2 years ago

509 been implmented now, we would be ready to work on this ticket.

kelson42 commented 2 years ago

After https://github.com/openzim/libzim/pull/697 has been made, this ticket is less urgent. But I would still do it.