meilisearch / integration-guides

Central reference for Meilisearch integrations.
https://meilisearch.com
MIT License
136 stars 15 forks source link

MeiliSearch as PostgreSQL extension #71

Closed pythoneer closed 1 year ago

pythoneer commented 3 years ago

Is your feature request related to a problem? Please describe. For most of the projects i need the feature set MeiliSearch provides i have almost always a regular database as my main data store – and this is almost always prostgreSQL. And in those cases data is often duplicated or at least linked with ids. Say i have a movie database with a long description for each movie i want to do a text search in. Those two datasets needs to be synced between the two data stores and it is often the case that i need to do multiple network/database round trips across that two data stores.

Say i have MeiliSearch to search for "Wizzard, Magik" and it returns some ids (123, 432, 666) that i use to query data in postgres that returns the full movie entry (Harry Potter, Persey Jackson, The Sorcerer’s Apprentice) so i can query the costomers that have watched that movies the last month on my streaming platform. You get the idea.

There is some effort behind the synchronization process and in handling the data in code and doing round trips and in modeling the data itself.

Describe the solution you'd like I would like to have a PostgreSQL extension that is able to handle most of those complications. To make things short and don't waste time with an unsatisfying explanation from my side the idea is to have something similar to ZomboDB. ZomboDB is "linking" Elasticsearch and PostgreSQL in such a way that i only have to interact with PostgreSQL alone as a proxy to Elasticsearch. As Elasticsearch is way to excessive for almost all my use cases (i don't have petabytes of data to search through) and MeiliSearch covers all of previous usecases and is way easier to setup, maintain, configure and use this would ease up the process to "just quickly" add a text search to your project by just.

CREATE TABLE movies (
    id BIGINT NOT NULL PRIMARY KEY,
    name text NOT NULL,
    keywords varchar(64)[],
    short_summary text,
    long_description meili.fulltext, 
    price bigint,
    inventory_count integer,
    discontinued boolean default false,
    availability_date date
);

CREATE INDEX idx_movies 
          ON movies 
       USING meili ((movies.*)) 
        WITH (url='localhost:7700/');

SELECT * 
  FROM movies 
 WHERE movies ==> 'botman+robin&limit=2';
curquiza commented 3 years ago

Hello @pythoneer Thanks for this suggestion, transferring this issue to the integration-guides repository! This is definitely a good plugin.

pythoneer commented 3 years ago

@curquiza Thanks for transferring the issue to the right place! Out of curiosity i have started a little experiment myself. I don't think it will reach a production ready state and acts more as a learning project for myself. If it is ok i can post some updates if there is anything worth of sharing, maybe it can be useful for others.

It is out of my time budget to get anywhere close to what ZomboDB is doing for Elasticsearch but for a little experimental approach i am planning to have something along the lines of:

CREATE TABLE movies (
    id BIGINT NOT NULL PRIMARY KEY,
    name text NOT NULL,
    short_summary text,
    long_description text,
    -- and more
);

-- creates an index and updates documents on every INSERT/UPDATE/DELETE on the movies table via REST API
SELECT meili.watch("http://localhost:7700/", movies); 

SELECT *
  FROM movies
 WHERE id IN (SELECT meili.query("harry potter"));
curquiza commented 3 years ago

Thanks for sharing your updates!! This is really helpful! 😁

israrwz commented 2 years ago

+1

curquiza commented 2 years ago

Hello @israrwz Welcome to the MeiliSearch community 🙂 if you have the time, what is the minimal feature you would expect in a Postgres plug-in?

Same question for everyone wanting the integration here of course!

rrjanbiah commented 2 years ago

@curquiza My expectation is something like @zombodb Reason: Currently, syncing the data is a pain. Or, if this can be sorted through something else, it is OK too. For example, I'm following up https://github.com/apache/nifi/pull/4065#issuecomment-1002862247 (especially, https://github.com/apache/nifi/pull/4065#issuecomment-774507893) too

Edit: Better link to the PR discussion

pythoneer commented 2 years ago

@israrwz thanks for allowing me to revisit this topic. @curquiza i haven't finished a minimal test that i would show others back then. Project still exists on my hard drive i may find some time over the next days to release something however bad – its just a minimal test with the pgx crate (from the author of zombodb) and would look pretty roughly like what i proposed in https://github.com/meilisearch/integration-guides/issues/71#issuecomment-752432762

EDIT: if i never get to a better state – i may just copy paste the important stuff from the project that i think is the current iteration here

EDIT2: I also found some of the relevant SQL scripts to use it (i think :P) haven't tested it yet since back then and my memory is vague

 --drop extension meili_extension; --create extension meili_extension;

 drop table movies;

create table movies (
    movie_id BIGSERIAL primary key,
    movie_name text not null,
    short_summary text,
    long_description text
);

insert into movies(movie_name, short_summary, long_description) values 
    ('Harry Potter', 'magic and stuff', 'There is a lot of magic stuff happening in this movie'),
    ('Terminator', 'action stuff', 'There is a cyborg sent back to the past to safe cyborgkind from mankind'),
    ('Forest Gump', 'Drama about a guy', 'A Drama about a guy that can do everything that he wants');

drop extension meili_extension; 
create extension meili_extension;

-- create trigger movies_update_trigger before update on movies for each row execute procedure meili.update_trigger();

create trigger movies_insert_trigger after insert on movies for each row execute procedure meili.update_trigger();

insert into movies(movie_name, short_summary, long_description) values ('movie x1', 'short', 'long');

select * from movies as m where m.movie_id in (select meili.query('http://localhost:7700', 'movies', 'x1'));
rrjanbiah commented 2 years ago

@pythoneer FWIW, I personally see a very good future for it, as a separate project. It may very well be piggybacked on the hype of Hasura too. Good luck!

curquiza commented 2 years ago

Hello @rrjanbiah and @pythoneer! Thanks for your feedback! And for your involvement!

The Postgres integration is definitely an integration we (the integration team) want to build and maintain. Our current human resources and the time we have made us delay the project many times, but be sure this is something we keep in mind :)

The integration team is currently growing which will help, and the Postgres integration is one of our priorities in 2022. We may work around or with what the community built first. So, thanks for this!

DeoLeung commented 2 years ago

I would vote for this integration.

also I would suggest a more deep integration like pgroonga, which we could use it more sql-like, but it's a bit trouble on scaling out.

bidoubiwa commented 2 years ago

Hello @DeoLeung ! Could you upvote on the roadmap?

jorroll commented 1 year ago

Hello @DeoLeung ! Could you upvote on the roadmap?

I just went and upvoted that issue @bidoubiwa but, FYI, that issue doesn't appear to be the same as this one. That issue talks about SQL integration. This issue is asking for meilisearch to be exposed as a Postgres extension (i.e. this issue is specific to Postgres). The solution to this issue (I expect) would not be compatible with other SQL databases (unless they implemented the Postgres API as some do).

I'm not sure what a generic SQL integration would look like, but I suspect it would not address this issue (at least for me).

eeeebbbbrrrr commented 1 year ago

Hi. Eric Ridge here. Primary developer behind ZomboDB and pgx.

Up above in https://github.com/meilisearch/integration-guides/issues/71#issuecomment-1008656726, @curquiza said:

Our current human resources and the time we have made us delay the project many times...

I'm not here to sell anything, but it's possible my employer would be willing to develop a MeiliSearch Postgres extension. It wouldn't be cheap, but it'd be yours to do with whatever you want.

For what it's worth, we have no intentions of doing this on our own. We are, in fact, working on our own solution -- yet another Rust text search library, but with Postgres as the first library consumer. But I don't see why that'd preclude us from doing a similar MeiliSearch integration, if a business arrangement could be made.

Anyways, good luck. MeiliSearch is super cool and the Postgres world can definitely benefit from better text search solutions! Feel free to reach out to me at eebbrr(at)gmail.com or DM me on twitter (https://twitter.com/zombodb).

brunoocasali commented 1 year ago

Hello, everyone watching this issue!

I'm closing this issue since it's been a while since there is no news regarding this topic, and it is not in our plans to work on such a feature. In any case, it would be nice if someone from the community could work on it.

meotimdihia commented 1 year ago

@brunoocasali This is an important feature. I followed Meilisearch for a long time, but synchronizing Melisearch with PostgreSQL is not easy => I always ended up using search full text + pg_trgm instead of. And it is the same for any search engine. If you can do it, I think you will win the market.

curquiza commented 1 year ago

Hello @meotimdihia Thanks for this feedback To let you know, we are currently working on finding a way to sync the primary database of the users with our Cloud offer. As you can guess it's not easy because multiple use cases and DB exist. We plan first to release guides to help people manage it, and we plan to provide a more adapted solution in the future. Stay tuned! 🤟

PzaThief commented 11 months ago

For someone who find yet the way to sync MeiliSearch with DB, I suggest to try meilisync before any PostgreSQL extension released. It use CDC method not extension method. but has advantage of easy support of variety DB.