Closed rgaudin closed 3 years ago
I take it this explains my failure to compile python-libzim:
159 | class Search
| ^~
libzim/wrapper.cpp: At global scope:
libzim/wrapper.cpp:1008:8: error: ‘search_iterator’ in namespace ‘zim’ does not name a type; did you mean ‘SearchIterator’?
1008 | zim::search_iterator __pyx_v_it;
and others
Hi @tim-moody, that's right. I'll fix that sometime but in the mean time, you can use the nightly from the CI or just use the latest release (libzim6). If all compatible nightlies are gone (we rotate files), let me know and I'll upload one somewhere
@rgaudin are you saying there is a nightly of python-libzim based on libzim7? I don't know where that is.
Also, is it true that kiwix-build does not build python-libzim?
http://download.openzim.org/nightly is based on libzim master which is libzim7 and yes, build happens here: https://github.com/openzim/python-libzim/tree/master/.github/workflows
If you want to use pylibzim, the best option is just to pip install
it. It would be libzim6 which is our latest release.
If you want libzim 7 version, follow the instructions from the workflow but as it downloads libzim binary from that nighty folder ; and because we only keep 30 files max, you have to use another version.
Now this ticket exists because libzim7's API has changes this month and it's possible even the oldest available nightly won't work with pylibzim. We are fixing pylibzim's usage of the API now.
@mgautierfr @maneeshpm I'm stuck trying to update the wrapper for the new search API.
here's what I have on Cython side:
wrapper.pyx
def get_estimated_search_results_count(self, query: str) -> int:
cdef wrapper.ZimSearcher searcher = wrapper.ZimSearcher(dereference(self.c_archive))
cdef wrapper.ZimQuery zim_query = wrapper.ZimQuery();
zim_query.setQuery(query.encode('UTF-8'), False);
cdef wrapper.ZimSearch search0 = searcher.search(zim_query);
return search0.getEstimatedMatches()
As you now, every type must be defined for Cython so we have wrapper types for zim ones.
lib.h
:
class ZimSearch: public zim::Search
{
public:
ZimSearch(zim::Search&& s) : zim::Search(s) {}
int getEstimatedMatches() const
{ return zim::Search::getEstimatedMatches(); }
};
wrapper.pxd
cdef extern from "lib.h":
cdef cppclass ZimSearch:
ZimSearch()
int getEstimatedMatches() except +
Now this doesn't compile with
error: call to implicitly-deleted copy constructor of 'zim::Search'
ZimSearch(zim::Search&& s) : zim::Search(s) {}
^ ~
/Users/reg/src/pylibzim/include/zim/search.h:162:9: note: copy constructor is implicitly deleted because 'Search' has a user-declared move constructor
Search(Search&& s);
^
libzim's search.h
has
class Search
{
public:
Search(Search&& s);
Search& operator=(Search&& s);
~Search();
/** Get a set of results for this search.
*
* @param start The begining of the range to get
* (offset of the first result).
* @param end The end of the range to get
* (offset of the result past the end of the range).
*/
const SearchResultSet getResults(int start, int end) const;
/** Get the number of estimated results for this search.
*
* As the name suggest, it is a estimation of the number of results.
*/
int getEstimatedMatches() const;
private: // methods
Search(std::shared_ptr<InternalDataBase> p_internalDb, const Query& query);
Xapian::Enquire& getEnquire() const;
private: // data
std::shared_ptr<InternalDataBase> mp_internalDb;
mutable std::unique_ptr<Xapian::Enquire> mp_enquire;
Query m_query;
friend class Searcher;
};
I am not familiar with this move-constructor thing and don't know how to move on from here.
@rgaudin The explanation of this behavior is more related to the lvalue
and rvalue
and I believe @mgautierfr is more qualified to comment on that.
In order to actually move the parameter we have to call std::move
on it. So please try to change your constructor as :
class ZimSearch: public zim::Search
{
public:
ZimSearch(zim::Search&& s) : zim::Search(std::move(s)) {}
int getEstimatedMatches() const
{ return zim::Search::getEstimatedMatches(); }
};
I hope it will compile fine then.
Thanks, unfortunately, Cython's still not happy
C++ class must have a nullary constructor to be stack allocated
The problem is that we cannot create a "default" zim::Search
.
By definition a zim::Search
is associated to a "database" (zim::Searcher
) and a query (zim::Query
).
The solution is to store a pointer to a zim::Search
instead of a zim::Search
itself.
This is what is made for Entry
as zim::Entry
has not default constructor neither.
Btw, why are you creating a intermediate ZimSearch
? Why not directly wrap zim::Search
?
Thanks, I got it working with
def get_estimated_search_results_count(self, query: str) -> int:
cdef wrapper.ZimSearcher searcher = wrapper.ZimSearcher(dereference(self.c_archive))
cdef wrapper.ZimQuery zim_query = wrapper.ZimQuery();
zim_query.setQuery(query.encode('UTF-8'), False);
return searcher.search(zim_query).getEstimatedMatches();
cdef extern from "zim/search.h" namespace "zim":
cdef cppclass ZimSearch "zim::Search":
int getEstimatedMatches()
Gonna check the tests and continue with the search results.
Be careful that create a new zim::Search
maybe time consuming.
If you will reuse the Search it is better to store it somewhere.
zim::Search* m_search = nullptr;
[...]
m_search = new zim::Search(ZimSearcher.search(query));
OK, in that previous case it was not to be reused but I do need a dedicated instance of the SearchResultSet
to iterate over the results
def suggest(self, query: str, start: int = 0, end: int = 10) -> Generator[str, None, None]:
cdef wrapper.ZimSearcher searcher = wrapper.ZimSearcher(dereference(self.c_archive))
cdef wrapper.ZimQuery zim_query = wrapper.ZimQuery();
zim_query.setQuery(query.encode('UTF-8'), True);
results = searcher.search(zim_query).getResults(start, end);
cdef wrapper.SearchIterator it = results.begin();
while it != results.end():
yield it.getPath().decode('UTF-8')
preincrement(it)
cdef extern from "zim/search_iterator.h" namespace "zim":
cdef cppclass SearchIterator:
SearchIterator()
SearchIterator operator++()
bint operator==(SearchIterator)
bint operator!=(SearchIterator)
string getPath()
string getTitle()
cdef extern from "zim/search.h" namespace "zim":
cdef cppclass SearchResultSet:
ctypedef SearchIterator iterator;
iterator begin()
iterator end()
int size()
cdef extern from "lib.h":
cdef cppclass ZimSearcher:
ZimSearcher()
ZimSearcher(const ZimArchive zimfile)
ZimSearcher(vector[const ZimArchive] zimfiles)
Search search(ZimQuery query)
cdef extern from "zim/search.h" namespace "zim":
cdef cppclass Search "zim::Search":
SearchResultSet getResults(int start, int end)
int getEstimatedMatches()
and I'm back to the same issue.
libzim/wrapper.cpp:13954:37: error: no matching constructor for initialization of 'zim::SearchResultSet'
new((void*)&(p->__pyx_v_results)) zim::SearchResultSet();
^
/Users/reg/src/pylibzim/include/zim/search.h:214:5: note: candidate constructor not viable: requires single argument 'p_internalDb', but no arguments were provided
SearchResultSet(std::shared_ptr<InternalDataBase> p_internalDb);
^
/Users/reg/src/pylibzim/include/zim/search.h:198:7: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
When I try what you proposed above, it says the object can't be converted to a Python object…
You need to use a child class of zim::Search
to return a pointer to a zim::SearchResultSet
(as for ZimEntry
which returns a pointer to a ZimItem
)
// In lib.h
class ZimSearch : public zim::Search
{
SearchResultSet* getResults(int start, int end)
{ return to_ptr<SearchResultSet>(zim::Search::getResults(start, end)); }
}
(Wrap ZimSearch instead of zim::Search in wrapper.pxd)
def suggest(self, query: str, start: int = 0, end: int = 10) -> Generator[str, None, None]:
cdef wrapper.ZimSearcher searcher = wrapper.ZimSearcher(dereference(self.c_archive))
cdef wrapper.ZimQuery zim_query = wrapper.ZimQuery();
zim_query.setQuery(query.encode('UTF-8'), True);
cdef wrapper.SearchResultSet* results = searcher.search(zim_query).getResults(start, end);
cdef wrapper.SearchIterator it = results.begin();
while it != results.end():
yield it.getPath().decode('UTF-8')
preincrement(it)
delete results # See comment below.
But you have to be careful about memory allocation, deleting the results at the end of the function is not enough as the user can simply stop interate on the generator and so never reach the end of the function.
The best current way to ensure the SearchResultSet is correctly deleted is to wrap it in a python class and delete the pointer in the destructor of the class.
This is what is done for Entry
: the from_entry(wrapper.ZimEntry* entry)
construct a instance on top of the cpp entry pointer and the __dealoc__
delete the given pointer.
Then, all methods returning a Entry
create the pointer and wrap it in the Entry
(as a lot of method of PyArchive
do.)
Here, as you never return the SearchResultSet
, we could use a std::unique_ptr instead.
The global picture is :
Entry
wrap a pointer to wrapper.ZimEntry
[which is a child of zim::Entry
[1] ])ZimArchive
is a child of zim::Archive
)PyArchive
uses ZimArchive
to get a pointer to a wrapper.ZimEntry
, wraps it in a Entry
and return the Entry
)[1] Entry is a both ends :
zim::Entry
has methods returning Entry (and Item with the same issue).So zim::Entry
is wrap twice : by ZimEntry
as a producer of wrapped object, and by Entry
as a produced object.
It's been almost a month since the last sync between libzim and pylibzim.
A number of changes have occurred in the mean time on libzim: clusterSize, search, suggestion, snippets, indexes, etc)
I don't know what's relevant but the current code base was updated last for https://github.com/openzim/libzim/commit/02d451f5ea04c1ae87180e98e67efba324d0401e on April 26th