k-samuel / faceted-search

PHP Faceted search library
MIT License
210 stars 15 forks source link

Laravel / Eloquent #5

Closed chrisvidal closed 2 years ago

chrisvidal commented 2 years ago

Hi, would that be possible to use a DB as an index instead of a json file? Or I am misunderstanding something. My tech stack is MySQL 5.7.x, OctoberCMS/Laravel with Eloquent

k-samuel commented 2 years ago

Hi, of course. Json file used only for example. Moreover, it is even better to cache in Redis/etc. Index file data is an associative array. You may choose the storage option at your discretion.

The only thing to understand. Index data must be fully loaded into script memory before starting the search. $searchIndex->setData($indexData);

chrisvidal commented 2 years ago

Thanks for your reply.

Let me see if I understand properly:

well, obviously, I have no clue how to use this lib, and would love some more doc for a real case like mine.

thanks

k-samuel commented 2 years ago

Working with faceted search consists of 2 stages.

Stage 1. Building the index. We need to collect the faceted index in the background (similar to how Sphinx does it).

Fetch all rows from the database that need to be indexed and pass them to a faceted index.

 $index->add(...)

Then save the resulting index to the database.

$data = $index->getData();
$yourDbStorage->saveIndexData($data);

You need to re-index periodically, depending on how often your data changes. Sometimes it is enough to do this once a day, there are times that you need to re-index in the background every 10 minutes.

Stage 2. Search When a user comes to the site and makes a request, we fetch the index data from the DB/cache and do

$index->setData($yourDbStorage->loadIndexData());

Then we search

$search->find(...);

Faceted search itself is not full-text and cannot search for incomplete matches. It only searches for a complete match of the attribute value.

It is designed to filter by attributes values.

How to be with fultext-search: https://github.com/k-samuel/faceted-search/issues/3

chrisvidal commented 2 years ago

Thanks a lot @k-samuel for this.

I totally get the concept of it all. thanks.

It is the concrete implementation of the index that I have difficulty to envion.

How does this look like in reality $yourDbStorage->saveIndexData($data); ? $data is an array. How can I save this array?

I was thinking to sync the index whenever a record is updated but it does not seem to be the principle at use here?

k-samuel commented 2 years ago

The easiest option in pseudocode

$sql = ' INSERT INTO my_facet_table (`category`,`index_data`) VALUES ("my_index_name", "'.json_encode($index->getData()).'")';

$query->execute($sql);

where index_data is LONGTEXT or JSON column

At this moment realtime index option is not implemented. Only full indexing.

k-samuel commented 2 years ago

Workaround. You can create update queue for records and launch indexer after update event. For example after uploading pricelist or batch update of product records.

k-samuel commented 2 years ago

Thanks for your reply.

Let me see if I understand properly:

* I need to maintain the DB index table: do I need to use the `$searchIndex->addRecord()` method or?

* * if I need to handle my own Index table in the DB, what rules and structure does it need to follow?

* when a search occurs on the frontend, not sure what is supposed to be done here on backend:

  * do myself a `select * from index` then loop through all the results to build an array and do the `->setData($indexData)`?

* I can see the use of the filters in the code example; I am not clear if we can search for a string as well in some fields like `name` and `description` for instance?

well, obviously, I have no clue how to use this lib, and would love some more doc for a real case like mine.

thanks

The index data is a black box by design. Think of it as the binary data of the index itself. After collecting them, you need to save them somewhere in order to later reuse.

Try running the example https://github.com/k-samuel/faceted-search/tree/master/examples and see how the code works.

Attribute values ​​are passed as filters to the search, which can be found from $search->findAcceptableFiltersCount / findAcceptableFilters

These are exact matches of values. If you pass the "green" color to the filter, then the attribute should contain exactly that value.

Full-text search by part of the title or phrase is not provided, this is a slightly different direction.

I hope this helps.

chrisvidal commented 2 years ago

yes it helps Thanks @k-samuel

I can spot 3 use cases in my app:

  1. search for a term
  2. search for a term with additional filters
  3. only filters

For 1./ I am thinking to create the index in real time after executing the SQL query to retrieve the records that match the searched term, then create the index because I'd like to use the findAcceptableFiltersCount for frontend

For 2./ similar to 1./ I guess

For 3./ I will use an existing index maintained and refresh every XX minutes.

k-samuel commented 2 years ago

Sounds good and Interesting. Before that, I did not think that the index can be built on the basis of the query results.

For the first option, there is an alternative way - to get a list of identifiers that you found when searching for a term and pass them to findAcceptableFiltersCount as the second parameter. In this case, you do not have to rebuild the index every time.

$filters = $search->findAcceptableFiltersCount([],$foundIds);
chrisvidal commented 2 years ago

thanks, it sounds great so far.

I am trying your idea

$foundIds is a flat array of Ids. $filters is an empty array.

$data = $search->findAcceptableFiltersCount($filters, $foundIds); returns an empty array;

any idea why? probably the way I build the index?

chrisvidal commented 2 years ago

...

    $companies = $query->get();

    foreach ($companies as $item) {
           $recordId = $item->id;
            $foundIds[] = $recordId;
            $itemData = [
                'capabilities' => $item['capabilities']->map(function ($value) use ($item) {
                    return $value->name;
                })->toArray(),
                'certificates' => $item['certificates']->map(function ($value) use ($item) {
                    return $value->name;
                })->toArray(),
                'tacs' => $item['tacs']->map(function ($value) use ($item) {
                    return $value->name;
                })->toArray(),
                'industries' => $item['industries']->map(function ($value) use ($item) {
                    return $value->name;
                })->toArray(),
                'enterprise_type' => $item->enterprise_type ? $item->enterprise_type->name : null,
                'country' => $item->country ? $item->country->name : null,
            ];
            $searchIndex->addRecord($recordId, $itemData);
        }

    $search = new Search($searchIndex);

   $result = [
       'filters' => $this->findFilters($search, $filters, $foundIds),
       'results' => ['data' => $results, 'count' => count($results), 'limit' => $pageLimit],
   ];
k-samuel commented 2 years ago

It looks like a bug. Here is a fix for this case (rare use case). https://github.com/k-samuel/faceted-search/commit/f52b904d61a0f54c06af1b5b7bb5940859da6284

I am planing to release of the new version of the library with fix today at 20:00 Moscow time.

Thank you for finding the problem and helping to improve the project.

chrisvidal commented 2 years ago

awesome! i updated to 1.3.4 and the acceptable filters are now available. Thanks @k-samuel

Is the v2.0 ready for use? What does it bring?

k-samuel commented 2 years ago

It changes index structure (incompatible change) https://github.com/k-samuel/faceted-search/blob/master/changelog.md Yes, ready to go. We can start a new thread for discussion. If the question according to the database is closed.