openzim / python-libzim

Libzim binding for Python: read/write ZIM files in Python
https://pypi.org/project/libzim/
GNU General Public License v3.0
65 stars 23 forks source link

Get the number of results (from `get_matches_estimated() #5

Closed jdcaballerov closed 4 years ago

jdcaballerov commented 4 years ago
  • shouldn't we get a generator on reader.search() ? Do the libzim returns a list as-is? It's frequent to have thousands of results on large ZIM files.

I did the same as node-libzim getting 10 results and passing back a string vector. File.h returns Search class unique pointers.

 std::unique_ptr<Search> search(const std::string& query, int start, int end) const;
 std::unique_ptr<Search> suggestions(const std::string& query, int start, int end) const;

Ah ok I see. that's unfortunate. I didn't realized you hardcoded that. We should definitely be able to set those. But we'd need the number of results (from get_matches_estimated()) as well. How do you plan to address that?

jdcaballerov commented 4 years ago

@rgaudin Can you clarify How's the API use of get_matches_estimated() to close this feature ??

I can already use the iterator from Cython for both search and suggestions (File.h)

The first number is the result of get_matches_estimated()

 def file_search(self, query, start=0, end=10):
        cdef unique_ptr[zim.Search] search = self.c_file.search(query.encode('UTF-8'),start, end) 
        cdef zim.search_iterator it = dereference(search).begin()

        print(dereference(search).get_matches_estimated()) 

        while it != dereference(search).end():
            yield it.get_url().decode('UTF-8')
            preincrement(it)

UTF-8 not decoded here but already fixed.

Screen Shot 2020-04-10 at 16 34 28
rgaudin commented 4 years ago

Thanks for this.

Otherwise, we'd need better names like get_search_results_count() and get_suggestions_results_count()

jdcaballerov commented 4 years ago

I have file_search since I haven't deleted the old search version waiting for your approval. The other works from C++ sending string vectors of 10, this one runs in Cython using the libzim search object directly.

If you agree then I'll only keep this and delete the C++ search code from the wrapper and add the counts as separate.

jdcaballerov commented 4 years ago

Done https://github.com/openzim/python-libzim/commit/054e7638fb398a0942a988e3122172acd1645c61