linuxlewis / djorm-ext-pgfulltext

PostgreSQL full-text search integration with django orm.
Other
250 stars 84 forks source link

The default index on VectorField is btree not gin #18

Open wolever opened 10 years ago

wolever commented 10 years ago

At least on Postgres 9.2, the VectorField is created with a btree index, which can't be (as far as I can tell) used for optimizing full text search queries.

It doesn't look like this can be addressed at the Django level (their function for creating field indexes looks pretty hard coded), so maybe it would be good to have a method like SearchManager.create_search_field_index()? And bonus points for something sensible vis-a-vi South integration?

wolever commented 10 years ago

Here's what I'm using for now:


    def create_search_field_index(self, using=None):
        using = using or self.db
        connection = connections[using]
        qn = connection.ops.quote_name
        field = self.search_field
        index_type = 'gin'

        tablespace_sql = ''
        tablespace = self.model._meta.db_tablespace
        if tablespace:
            tablespace_sql = connection.ops.tablespace_sql(tablespace)
            if tablespace_sql:
                tablespace_sql = ' ' + tablespace_sql

        self._execute_sql(connection,
            """
                CREATE INDEX {name} ON {table}
                USING {type}({column}){tablespace}
            """,
            name=qn(self._get_unquoted_search_index_name()),
            table=qn(self.model._meta.db_table),
            column=qn(field),
            type=index_type,
            tablespace=tablespace_sql,
        )

    def drop_search_field_index(self, using=None):
        using = using or self.db
        connection = connections[using]
        qn = connection.ops.quote_name
        self._execute_sql(connection,
            "DROP INDEX {name} ON {table}",
            name=qn(self._get_unquoted_search_index_name()),
            table=qn(self.model._meta.db_table),
        )

    def _execute_sql(self, connection, sql, **kwargs):
        cur = connection.cursor()
        cur.execute(sql.format(**kwargs))

    def _get_unquoted_search_index_name(self):
        return "%s_%s" %(
            self.model._meta.db_table,
            self.search_field,
        )
rodo commented 9 years ago

As describe in PsotgreSQL doc http://www.postgresql.org/docs/9.4/static/textsearch-indexes.html a Gist/Gin index must be used to index ts_vector. The btree has antoher disavantage to can't contain huge size of text.