kaleidos / grails-postgresql-extensions

Grails plugin to use postgresql native elements such as arrays, hstores,...
Apache License 2.0
78 stars 63 forks source link

Full text search: a criteria for tsquery/tsvector would be gorgeous #83

Open rawi2 opened 8 years ago

rawi2 commented 8 years ago

A groovy plug in for a great database. Many thanks for it.

Would it be very difficult to implement functions for tsquery/tsvector also?

I think, after that, postgresql would become the most used database with grails :)

Thank you

ilopmar commented 8 years ago

I haven't used full text search with postgres, that's one of the reason we haven't support for it yet.

Could you please send me an small example in pseudo-code of how would you like to use these functions and the equivalent postgres native sql? With that I could see how difficult is the task.

Thanks!

rawi2 commented 8 years ago

Thank you for looking into this!

CREATE TABLE orpha.disorder_name
(
    -- id etc
    disorder_name character varying(250),
    disorder_name_fts tsvector,
    -- constraints etc
);

disorder_name_fts is the disorder_name already prepared as full-text search-able format (tsvector) - per trigger function on update or insert.

The two fields are looking like (SELECT disorder_name, disorder_name_fts....):

"Autosomal dominant distal spinal muscular atrophy" | "'atrophi':6 'autosom':1 'distal':3 'domin':2 'muscular':5 'spinal':4"

Possible SQLs: If I don't have a disorder_name_fts field I need to make disorder_name searchable with the function to_tsvector('language', field-name)

Like the following, I'll search for records with the words "spinal" AND (words beginning with) "musc" AND "atrophy"

SELECT *
FROM orpha.disorder_name
WHERE to_tsvector('english', disorder_name) @@ to_tsquery('english', 'spinal & musc:* & atrophy')

If I do not mention the language, the locale of the database will be used, like

WHERE to_tsvector(disorder_name) @@ to_tsquery('spinal & musc:* & atrophy')

Using the already calculated tsvector from the field disorder_name_fts the SQL will become

SELECT *
FROM orpha.disorder_name
WHERE disorder_name_fts @@ to_tsquery('english', 'spinal & musc:* & atrophy')

or using the locale

WHERE disorder_name_fts @@ to_tsquery('spinal & musc:* & atrophy')

So, the example of a function would be (found long ago somewhere on the internet):

import org.hibernate.dialect.PostgreSQLDialect;
import org.hibernate.dialect.function.SQLFunctionTemplate;
import org.hibernate.type.StandardBasicTypes;
import org.hibernate.dialect.function.VarArgsSQLFunction;

public class PostgreSQLDialectExtended extends PostgreSQLDialect {
    public PostgreSQLDialectExtended() {
        super();
        registerFunction(
            "pgFullTextSearch", 
            new VarArgsSQLFunction(
                StandardBasicTypes.BOOLEAN, 
                "to_tsvector(", // + hier comes the 1st argument on call
                ") @@ to_tsquery(", // + hier comes the 2nd argument on call
                ")"
            )
        );
    }

    public boolean supportsInsertSelectIdentity() { return true; }

    public String appendIdentitySelectToInsert(String insertString) { 
       return insertString + " RETURNING id"; 
    }
}

Calling the function in criteria:

pgFullTextSearch(
    "'english', disorder_name" , 
    "'english', 'spinal & musc:* & atrophy'"
)

But we need a version for the case that I have already the tsvector in disorder_name_fts

    registerFunction(
        "pgFullTextSearch2", 
        new VarArgsSQLFunction(
            StandardBasicTypes.BOOLEAN, 
            "", // hier comes the 1st argument on call
            " @@ to_tsquery(", // hier comes the 2nd argument on call
            ")"
        )
    );

Calling it:

pgFullTextSearch2(
    "disorder_name_fts" , 
    "'english', 'spinal & musc:* & atrophy'"
)

And the absolute minimum:

    registerFunction(
        "pgFullTextSearch3", 
        new VarArgsSQLFunction(
            StandardBasicTypes.BOOLEAN, 
            "", // hier comes the 1st argument on call
            " @@ ", // hier comes the 2nd argument on call
            ""
        )
    );

... and the people have to build the arguments with the postgresql function names in them:

Calling it:

pgFullTextSearch3(
    "disorder_name_fts" , 
    "to_tsquery('english', 'spinal & musc:* & atrophy')"
)

...or

pgFullTextSearch3(
    "to_tsvector('english', disorder_name)" , 
    "to_tsquery('english', 'spinal & musc:* & atrophy')"
)

There is a more complicated method found here: https://forum.hibernate.org/viewtopic.php?p=2447426

There I would adapt the List of arguments to one more, to transport the information, if the database field needs the function to_tsvector() or not

Thank you very much!

rawi2 commented 8 years ago

sorry, I didn't want to close it

rawi2 commented 8 years ago

... oh I can reopen it...

ilopmar commented 8 years ago

Thank you very much for such a detailed explanation. When I have some time time I'll look at it and let you know how it goes.

Are you interested in adding this feature in Grails 2 or Grails 3?

rawi2 commented 8 years ago

I thank you very much in advance!

Embarrassing to say, but I'm an old admin, I scripted a bit Groovy and this is my third attempt to find the time to learn Grails (3) - hands on a project. I'm able to read java, but I'm not prolific in it. Sorry to disappoint that much...

loicgeo commented 7 years ago

This a nice example and i used it thank you. But if i want to extract the to_tsquery call, how to declare it in dialect functions (which org.hibernate.type i have to use)?

I was trying to do as this example https://www.postgresql.org/docs/9.5/static/textsearch-controls.html when they use to_tsquery in FROM part of the SQL query.

rawi2 commented 7 years ago

I couldn't follow my former project - because other tasks - so I didn't have the opportunity to deepen this matters any more.

But well, it's an interesting question, concerning postgresql and may be this plugin :)

A shot into the blue... never done: For the FROM part of a SQL you'd have to declare a non persistent domain (static mapWith = "none") or a CommandObject or a POGO with a property 'query'; create on its basis an object and with value 'neutrino|(dark & matter)' as query and use it in your hibernate-criteria-query.

Or declare another POGO like the SQL-Result you are awaiting, use plain SQL and cast its result as an array of this POGO (?)

Give it a try and then give a feed back, how it worked ;)