Closed MikeNGarrett closed 4 years ago
That's a good point. I recently moved another very slow task to an AJAX action – whenever it was needed to be done, an AJAX action is triggered to update it and store the value in an option. A similar approach would probably work here.
You're right, an exact value is not necessary, an approximation will do fine. I'll see what is the best approach to optimize this, there's definitely some optimization possible here. I'll update this issue when I come up with something.
Sounds good. I'd be happy to contribute an idea, if you'd accept a PR.
Yes, PRs are welcome.
Thanks. Couple of points:
1) If the database is so large or the site performs so poorly that doc_count
cannot be counted, Relevanssi doesn't really need to support that. I don't mind if doc_count
is slightly imprecise, but 0? That just makes the search weights meaningless, and I don't like that. In those cases using Relevanssi Light is always a better idea, and I'd rather point people to that, because that's an actual, functional search that performs much, much better and produces accurate results fast even with very large databases.
2) I generally don't like scheduled tasks and most of the time there's no reason to update the doc count every hour; there are many sites where the doc count needs an update once a week, or once a day, thus running the update action hourly just wastes resources. Also, adding the scheduled action makes it necessary to clear the hook on deactivation, which adds unnecessary code overhead (also it's confusing to have two sets of functions called relevanssi_uninstall()
and _relevanssi_uninstall()
that run on different occasions and do different things). So I'll go with my original idea of making the update an async action.
I agree it's stupid to count all the counts every time the indexing tab is loaded, so I'll add those other counts as options and have the async action update those as well.
Understood.
Another issue I was addressing in this PR was the scope of insert_edit
which fires a new document count for anything that triggers an insert or update, even if the index isn't updated. For example, editing a menu calls this for each menu item in the menu.
Thank you very much for reviewing this so quickly. I really appreciate the time you've taken.
I was thinking about the relevanssi_disable_count
filter hook – in a situation where that's necessary, using Relevanssi in the first place doesn't make sense, and Relevanssi Light is a much better tool.
Good point about relevanssi_insert_edit()
– I modified it now so that relevanssi_async_update_doc_count()
only fires if the post is published. (And not when the post is removed, because relevanssi_remove_doc()
already runs it, so no need to run it twice.)
Hello!
Before I say anything else, I want to thank you for publishing such an useful and well-documented plugin. I use Relevanssi free and premium for a number of clients. Thank you.
I've read your comments in a number of places stating that at a certain point Relevanssi should not handle very large data sets. I absolutely agree, but I am always interested in finding ways to increase that limit, if possible.
One of the biggest bottlenecks I've found is the database query:
SELECT COUNT(DISTINCT(doc))
. (code reference)I understand this is important to calculate weights and relevance, but it seems like this can be moderately accurate and still achieve similar results. Is that true?
If so, could this count be deferred to a scheduled task that updates the option? This would increase performance pretty significantly for all uses, but especially for sites with large indexes.