loupe-php / loupe

A full text search engine with tokenization, stemming, typo tolerance, filters and geo support based on only PHP and SQLite.
MIT License
271 stars 8 forks source link

Add a LoupeManager to manage multipe Indexes and save configurations #15

Open alexander-schranz opened 1 year ago

alexander-schranz commented 1 year ago

What is a little bit different to other search engines is that at creating the Loupe instance (what in other cases is the client) the whole configuration searchablefields, filterablefields, .. is required where in other cases this I only need to know about in the SchemeManager where I call createIndex method not in cases of searching or indexing documents.

For this I created this Helper class which could also be called LoupeManager. It even serialize the configuration into a file this way it is more strict as changing updating the configuration can only be done like in other engine via the SchemaManager::createIndex method and so it is more consistent through the whole library and so without calling that the changes apply no effects.

For my usecases I maybe will be changing it when I got more time.

But it could also be part of Loupe, a none SEAL related LoupeManager could look like this. What do you think about it?

This could be achieved by a LoupeManager which looks like this:

class LoupeManager
{
    /**
     * @var Loupe[]
     */
    private array $loupes = [];

    public function __construct(
        private readonly LoupeFactory $loupeFactory,
        private string $directory,
    ) {
    }

    public function getLoupe(string $index): Loupe
    {
        if (!isset($this->loupe[$index->name])) {
            $configurationFile = $this->directory . $index . '.loupe';

            if (!\file_exists($configurationFile)) {
                throw new \LogicException('Index need to be created before accessing Loupe instance.');
            }

            /** @var string $configurationContent */
            $configurationContent = \file_get_contents($configurationFile);

            /** @var Configuration $configuration */
            $configuration = \unserialize($configurationContent);

            $this->loupes[$index->name] = $this->loupeFactory->create($this->directory . $index . '.db', $configuration);
        }

        return $this->loupes[$index->name];
    }

    public function reset(): void
    {
        $this->loupes = [];
    }

    public function existIndex(string $index): bool
    {
        return file_exists(($this->directory . $index . '.db', $this->directory . $alias . '.db');
    }

    public function dropIndex(string $index): void
    {
         unlink($this->directory . $index . '.db', $this->directory . $alias . '.db');
         unlink($this->directory . $index . '.loup', $this->directory . $alias . '.loupe');
    }

    public function createIndex(string $index, Configuration $configuration): void
    {
        $configurationContent = \serialize($configuration);

        file_put_contents($this->directory . $index . '.loupe', $configurationContent);
        touch($this->directory . $index . '.db');
    }

    public function alias(string $index, string $alias): void
    {
         // aliases via symlinks for zerodowntime reindexing
         symlink($this->directory . $index . '.db', $this->directory . $alias . '.db');
         symlink($this->directory . $index . '.loup', $this->directory . $alias . '.loupe');
    }

    public function removeAlias(string $alias): void
    {
         unlink($this->directory . $alias . '.db');
         unlink($this->directory . $alias . '.loupe');
    }

    public function readAlias(string $alias): string
    {
         $target = readlink($this->directory . $alias . '.db');

         return substr($target, strlen($this->directory), -3);
    }
}

Save the configuration can let the search engine run with the data it was created it and make the changing the configuration a more active method call can even allow to have 2 instances of a Index a Read with old configuration and a Write with the new configuration.

$readLoupe = $loupeManager->getLoupe('blog');

$loupeManager->createIndex('blog_202208050332', $newConfiguration);
$writeLoupe = $loupeManager->getLoupe('blog_202208050332');

$writeLoupe->addDocuments([/* ... */]); // reindex all documents

$oldIndex = $loupeManager->readAlias('blog');
$loupeManager->alias('blog', 'blog_202208050332');
$loupeManager->dropIndex($oldIndex);

Maybe the configuration could be saved in the database also so only one file exists and easier to support in memory with the same case.

Overall I understand also that this could be make things to complicated and I'm fine to keep the LoupeHelper / LoupeManager in SEAL.

Toflar commented 1 year ago

The problems I see here:

  1. I think we should make the configuration serializable as JSON rather than using serialize(), don't you think? So maybe this is something we should address first.
  2. I don't like the word manager šŸ˜„ Is that how you name it in SEAL? Or is it what other engines use?

However, I think it would be a great addition. It could also be the place where you could copy and merge two indexes or other stuff. So I'm not necessarily against having it in Loupe itself.

alexander-schranz commented 1 year ago

I think we should make the configuration serializable as JSON rather than using serialize(), don't you think? So maybe this is something we should address first.

I have currently a similar issue on SEAL where its still on my list to discuss once with nicolas from symfony what the best way is to cache such files which don't change between deployments and make usage of the opcache maybe.

I don't like the word manager šŸ˜„ Is that how you name it in SEAL? Or is it what other engines use?

I don't have a word yet in SEAL currently I just named it Helper because I did not come up with a better name. Then I was thinking about entityManager from doctrine and so got to the LoupeManager name. In the other engines it is always the Client instance which they provide but would in case of Loupe be not a good name I think.

Toflar commented 1 year ago

Okay, I see. Well, I'm totally open to having this in Loupe itself - I think it could be valuable not just for SEAL šŸ˜Š

alexander-schranz commented 1 year ago

Another naming idea could be:

LoupeManager / LoupeHelper -> Loupe Loupe -> Index LoupeFactory -> IndexFactory