trilbymedia / grav-plugin-tntsearch

Powerful indexed-based full text search engine powered by the TNTSearch library
https://trilby.media
MIT License
61 stars 24 forks source link

Scheduler does not index any pages #95

Open torohill opened 4 years ago

torohill commented 4 years ago

When I run Grav\Plugin\TNTSearchPlugin::indexJob using the Grav scheduler no pages are being indexed. This means the TNTSearch database gets emptied of all contents, and all searches return no results.

The output below shows the issue. Initially there are 137 rows in the wordlist table, but after running indexJob via the Grav scheduler there are 0 rows. If I then run bin/plugin tntsearch index there are 137 rows again.

app@10cd0795a2b8:/app/public$ bin/gpm index -I | grep tntsearch
| 7     | TNT Search               | tntsearch                | v3.1.1  | installed |
app@10cd0795a2b8:/app/public$ sqlite3 user/data/tntsearch/grav.index "select count(*) from wordlist"
137
app@10cd0795a2b8:/app/public$ bin/grav scheduler -v

Running Scheduled Jobs
======================

 [2020-05-19T04:12:13+00:00] Success: Grav\Plugin\TNTSearchPlugin::indexJob
app@10cd0795a2b8:/app/public$ sqlite3 user/data/tntsearch/grav.index "select count(*) from wordlist"
0
app@10cd0795a2b8:/app/public$ bin/plugin tntsearch index

Re-indexing

Added   1 /stories/another-another
...snip..
Added   143 /stories/yet-another-story-12
Total rows 143

Indexed in 0.7s
app@10cd0795a2b8:/app/public$ sqlite3 user/data/tntsearch/grav.index "select count(*) from wordlist"
137

I took a look at the code and it seems this is because $grav['pages']->init() is never called when running the scheduler job, so there is no data to index. When using bin/plugin tntsearch index the pages get initialised through the $this->initializePages() call in IndexerCommand::serve().

I'm not sure if the Grav core should be initialising pages when the scheduler is run, or whether it should be done in the plugin. And I don't know enough about how Grav works to know what call needs to be made to initialise the pages. It looks like events need to be fired etc, so it's not just a matter of calling $grav['pages']->init().

My temporary solution is to run bin/plugin tntsearch index via it's own cronjob and disable the indexJob scheduled job.

I originally added a comment to issue #81 about this, but I think that is actually a separate issue, as the scheduler job is running fine it's just not indexing anything.

rhukster commented 4 years ago

What version of Grav is this?

torohill commented 4 years ago

Sorry, should have included that.

app@10cd0795a2b8:/app/public$ bin/grav list | head -n 1
Grav CLI Application 1.6.25
acondura commented 4 years ago

Same issue on Grav 1.7

rhukster commented 4 years ago

Ok, I dug into this a bit today. During the past few months, we broke out much of the initialization code so that it could be controlled at a more granular level. This was so that we didn't have to load so much stuff when it was not needed, therefore increasing performance overall. The CLI code has a bunch of built-in initialization methods that can be called independently. For example, initializePages can be called directly and this in turn calls initializeThemes, which in turn calls initializePlugins etc.

Because of this the scheduler is not fully initialized for pages, and these init methods are in ConsoleCommand.

I think eventually I would like to move these to somewhere more 'accessible' by other things other than just console commands, so we can more easily enablePages() from anywhere, but for now, the simplest and most reliable solution is to simply change the scheduler job to use the CLI command (as you already worked out).

ViliusS commented 3 years ago

With Andy's fix and both fixes mentioned in https://github.com/trilbymedia/grav-plugin-tntsearch/issues/81 it should be running fine now. Tested on Grav 1.6.27 and TNTSearch Plugin version 3.2.1.