loupe-php / loupe

A full text search engine with tokenization, stemming, typo tolerance, filters and geo support based on only PHP and SQLite.
MIT License
271 stars 8 forks source link

Include state cache always #86

Closed discordier closed 1 week ago

discordier commented 1 week ago

Method initialize was using require_once to read a state file. This caused the cache to be read correctly on the first initialize call but returned a simple true on subsequent usages (See documentation of require_once). This has been changed to use require instead and now always reads the file.

Toflar commented 1 week ago

Hey Chris, nice seeing you here!! I see, we initialize it only once so it's not a problem as we keep the $this->initialized class property. But I guess you are instantiating the StateSet multiple times with the same cache file, is that the problem?

discordier commented 1 week ago

Exactly that's the problem. I do not use them as "Singleton" in my application workflow. I have message handlers that instantiate them "on demand" and then ran into the problem of getting true here.

IMO it does not hurt to read the file there over and over again if we instantiate multiple times.

Toflar commented 1 week ago

Makes sense but with your change, we would use double (or more) memory, wouldn't we? Maybe this should be statically loaded?

discordier commented 1 week ago

I fail to see why this would use double or more memory when the first instance (in my case) is gone - so no problem there (gc run implied).

In your case, the cache would never be re-read, which will open another bug then (Pseudo abbreviated code):

$first = new StateSet($engine);
$first->add(...); // initializes from cache file and adds the state.
$first->persist(); // Writes cache file

// Second instance
$second = new StateSet($engine);
$second->add(...); // BUG!!!!!
$second->add(...);
$second->add(...);
$second->persist();

In the line marked as BUG, we now initializes with:

Maybe you can elaborate where you see the problem of using more memory - I mean, they are using separate InMemoryStateSet instances that are not tied to the storage at all, why should it load data statically for them?

In the end, the base problem is, that it is not possible to instantiate the same index again if needed, which requires to keep an instance around for "just in case I need it again" which feels way more of a memory hog/leak.

Toflar commented 1 week ago

I see 👍 Merging and releasing this 😊 Thanks for contributing!