Closed stijn-uva closed 2 years ago
What's strange imho is that it goes away after a restart. It could make sense that some routes and some specific calls to get_webentity_pagelinks_network
or get_webentity_pages
would take some time in the traph depending on the corpus and its content, but it should be consistent then. And I see you changed the issue to point get_webentities
instead in which case I can think of another potential source of problem, corresponding mostly to collecting DISCOVERED webentities and trying to set them automatically with a homepage (in which case depending on the fields you need, you can try and bypass this slowing operation by using the light
or semilight
arguments in most get_webentities routes). Could you be more precise on the corresponding calls?
Could you enable DEBUG in your config, set it to 2, and paste the full logs from the query to the answer when you encounter it again?
Thanks, will log for a while and keep an eye on them when this occurs, more news later! :-)
Oh, and the specific call parameters here are:
call = ["store.get_webentities", [], 0, 100, "::page::", False, False, False, self.corpus_id]
Where ::page::
is replaced by an incrementing number until all web entities are collected. So we're calling for the full web entity details indeed, that's something I can also experiment with.
But do you actually need the homepage
field? Otherwise switching to False, True, False might help.
Also note that your use of the pagination is not the proper way of this api: you should collect a token after the first request then switch and call instead the get_webentities_page(token, page_number, False, corpus)
as explained in the documentation
Hey @stijn-uva, I'm closing this one for now, but please reopen it if you still encounters similar problems
Hey @boogheta, sorry for never following up on this. We're not having this problem anymore since we started using the pagination feature added to the API a while ago, so it seems safe to close it indeed!
Apologies for the vague issue, but we have a problem where sometimes the API takes over a second per call, and sometimes (for the same corpus) we get a response in a fraction of second. This concerns the
store.get_webentities
endpoint in particular but also others. It does not seem to depend on whether Hyphe is crawling anything or not, in fact as far as I can see nothing particularly demanding is running in Hyphe at the same time when this occurs. It usually goes away after a restart, but that's not ideal...Do you have any ideas about what could be causing this and where we might look to address this?