loupe-php / loupe

A full text search engine with tokenization, stemming, typo tolerance, filters and geo support based on only PHP and SQLite.
MIT License
302 stars 11 forks source link

Add a LoupeManager to manage multipe Indexes and save configurations #15

Open alexander-schranz opened 1 year ago

alexander-schranz commented 1 year ago

What is a little bit different to other search engines is that at creating the Loupe instance (what in other cases is the client) the whole configuration searchablefields, filterablefields, .. is required where in other cases this I only need to know about in the SchemeManager where I call createIndex method not in cases of searching or indexing documents.

For this I created this Helper class which could also be called LoupeManager. It even serialize the configuration into a file this way it is more strict as changing updating the configuration can only be done like in other engine via the SchemaManager::createIndex method and so it is more consistent through the whole library and so without calling that the changes apply no effects.

For my usecases I maybe will be changing it when I got more time.

But it could also be part of Loupe, a none SEAL related LoupeManager could look like this. What do you think about it?

This could be achieved by a LoupeManager which looks like this:

class LoupeManager
{
    /**
     * @var Loupe[]
     */
    private array $loupes = [];

    public function __construct(
        private readonly LoupeFactory $loupeFactory,
        private string $directory,
    ) {
    }

    public function getLoupe(string $index): Loupe
    {
        if (!isset($this->loupe[$index->name])) {
            $configurationFile = $this->directory . $index . '.loupe';

            if (!\file_exists($configurationFile)) {
                throw new \LogicException('Index need to be created before accessing Loupe instance.');
            }

            /** @var string $configurationContent */
            $configurationContent = \file_get_contents($configurationFile);

            /** @var Configuration $configuration */
            $configuration = \unserialize($configurationContent);

            $this->loupes[$index->name] = $this->loupeFactory->create($this->directory . $index . '.db', $configuration);
        }

        return $this->loupes[$index->name];
    }

    public function reset(): void
    {
        $this->loupes = [];
    }

    public function existIndex(string $index): bool
    {
        return file_exists(($this->directory . $index . '.db', $this->directory . $alias . '.db');
    }

    public function dropIndex(string $index): void
    {
         unlink($this->directory . $index . '.db', $this->directory . $alias . '.db');
         unlink($this->directory . $index . '.loup', $this->directory . $alias . '.loupe');
    }

    public function createIndex(string $index, Configuration $configuration): void
    {
        $configurationContent = \serialize($configuration);

        file_put_contents($this->directory . $index . '.loupe', $configurationContent);
        touch($this->directory . $index . '.db');
    }

    public function alias(string $index, string $alias): void
    {
         // aliases via symlinks for zerodowntime reindexing
         symlink($this->directory . $index . '.db', $this->directory . $alias . '.db');
         symlink($this->directory . $index . '.loup', $this->directory . $alias . '.loupe');
    }

    public function removeAlias(string $alias): void
    {
         unlink($this->directory . $alias . '.db');
         unlink($this->directory . $alias . '.loupe');
    }

    public function readAlias(string $alias): string
    {
         $target = readlink($this->directory . $alias . '.db');

         return substr($target, strlen($this->directory), -3);
    }
}

Save the configuration can let the search engine run with the data it was created it and make the changing the configuration a more active method call can even allow to have 2 instances of a Index a Read with old configuration and a Write with the new configuration.

$readLoupe = $loupeManager->getLoupe('blog');

$loupeManager->createIndex('blog_202208050332', $newConfiguration);
$writeLoupe = $loupeManager->getLoupe('blog_202208050332');

$writeLoupe->addDocuments([/* ... */]); // reindex all documents

$oldIndex = $loupeManager->readAlias('blog');
$loupeManager->alias('blog', 'blog_202208050332');
$loupeManager->dropIndex($oldIndex);

Maybe the configuration could be saved in the database also so only one file exists and easier to support in memory with the same case.

Overall I understand also that this could be make things to complicated and I'm fine to keep the LoupeHelper / LoupeManager in SEAL.

Toflar commented 1 year ago

The problems I see here:

  1. I think we should make the configuration serializable as JSON rather than using serialize(), don't you think? So maybe this is something we should address first.
  2. I don't like the word manager šŸ˜„ Is that how you name it in SEAL? Or is it what other engines use?

However, I think it would be a great addition. It could also be the place where you could copy and merge two indexes or other stuff. So I'm not necessarily against having it in Loupe itself.

alexander-schranz commented 1 year ago

I think we should make the configuration serializable as JSON rather than using serialize(), don't you think? So maybe this is something we should address first.

I have currently a similar issue on SEAL where its still on my list to discuss once with nicolas from symfony what the best way is to cache such files which don't change between deployments and make usage of the opcache maybe.

I don't like the word manager šŸ˜„ Is that how you name it in SEAL? Or is it what other engines use?

I don't have a word yet in SEAL currently I just named it Helper because I did not come up with a better name. Then I was thinking about entityManager from doctrine and so got to the LoupeManager name. In the other engines it is always the Client instance which they provide but would in case of Loupe be not a good name I think.

Toflar commented 1 year ago

Okay, I see. Well, I'm totally open to having this in Loupe itself - I think it could be valuable not just for SEAL šŸ˜Š

alexander-schranz commented 1 year ago

Another naming idea could be:

LoupeManager / LoupeHelper -> Loupe Loupe -> Index LoupeFactory -> IndexFactory

daun commented 1 week ago

If anyone's looking for a very much reduced version of an index manager that creates, drops, and initialises multiple indexes, this is what I'm currently using in Laravel/Statamic land.

<?php

use Illuminate\Support\Facades\File;
use Loupe\Loupe\Configuration;
use Loupe\Loupe\Loupe;
use Loupe\Loupe\LoupeFactory;
use Statamic\Facades\Path;

class Manager
{
    /**
     * @var Loupe[]
     */
    protected array $clients = [];

    public function __construct(
        protected readonly LoupeFactory $factory,
        protected string $path,
    ) {
        if (! File::isDirectory($this->path)) {
            File::makeDirectory($this->path, recursive: true);
        }
    }

    public function get(string $index, Configuration $configuration): Loupe
    {
        return ($this->clients[$index] ??= $this->make($index, $configuration));
    }

    public function make(string $index, Configuration $configuration): Loupe
    {
        $this->createIndex($index);

        return $this->factory->create($this->indexDirectory($index), $configuration);
    }

    public function indexDirectory(string $index): string
    {
        return Path::resolve("{$this->path}/{$index}");
    }

    public function indexPath(string $index): string
    {
        return Path::resolve("{$this->path}/{$index}/loupe.db");
    }

    public function indexExists(string $index): bool
    {
        return File::exists($this->indexPath($index));
    }

    public function createIndex(string $index): void
    {
        if (! File::isDirectory($dir = $this->indexDirectory($index))) {
            File::makeDirectory($dir, recursive: true);
        }
        if (! File::exists($db = $this->indexPath($index))) {
            File::put($db, '');
        }
    }

    public function dropIndex(string $index): void
    {
        File::deleteDirectory($this->indexDirectory($index));
    }
}
Toflar commented 1 week ago

Nice! Guys, feel free to PR your solutions - I think it would be nice having such a manager. The only requirement from my side: No new dependencies for file handling if possible - if not, then symfony/filesystem because I'm already using this for require-dev.

daun commented 1 week ago

I think Laravel uses Symfony Filesystem under the hood, but let me double check.

zonky2 commented 3 days ago

Is this issue fixed with https://github.com/loupe-php/loupe/pull/108?

daun commented 3 days ago

@zonky2 No, we closed that PR as it didn't prove to be very thought through :) There's an upcoming feature for deleting all documents, i.e. clearing the index, that might serve your purpose.