linuxlewis / djorm-ext-pgfulltext

PostgreSQL full-text search integration with django orm.
Other
250 stars 84 forks source link

VectorField has db_index=True sometimes resulting in "index row size XXXX exceeds maximum 2712" #45

Open cpbotha opened 9 years ago

cpbotha commented 9 years ago

This is definitely related to https://github.com/djangonauts/djorm-ext-pgfulltext/issues/18

In testing with a bunch of sample documents and converting their contents with to_tsvector() into a VectorField I saw the following error:

OperationalError: index row size 2872 exceeds maximum 2712 for index "blabla_tsv"
HINT:  Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text indexing.

db_index=True in the VectorField class results in Django trying to create a "normal" index on the tsvector, instead of GIN or GIST.

Suggestions for work-arounds are most welcome!

jhgg commented 9 years ago

I'm experiencing the same issue. What I've done is implemented my own "VectorField" with db_index set to False (within the init), and manually creating the GIN/GiST index (usually I add this to the operations array that makemigrations creates when creating the search field.).

from django.db import models, migrations
class Migration(migrations.Migration):

    dependencies = []

    operations = [
        migrations.RunSQL(
            "CREATE INDEX tablename_searchidx ON tablename USING gin(search_index);",
        ),
    ]
cpbotha commented 9 years ago

Here's a pull request that simply does NOT set db_index, leaving it to the user to decide: https://github.com/djangonauts/djorm-ext-pgfulltext/pull/46

linuxlewis commented 9 years ago

Hey folks, new maintainer here. So we've gone ahead updated the VectorField to support db_index=False values. That should prevent the ORM from creating the incorrect index. https://github.com/linuxlewis/djorm-ext-pgfulltext/commit/db8c220f0476bb6823204b9074909b33e844d9a4

We'll definately want to update the code so when db_index=True the correct index is set

ChiChou commented 8 years ago

@linuxlewis Why the 0.10 on pypi has brought kwargs['db_index'] = True back and make this problem occurs again?

daniel5gh commented 8 years ago

Regarding the 0.10 on pypi, please note 0.10 is older than 0.9.X. It should probably have been 0.1.0::

tag 0.10   Latest commit 111a5ac  on May 22, 2014
tag 0.9.3  Latest commit 2ffd9c9  on Mar 31, 2015

Also note that since Django 1.10 there is this which may be helpful: https://docs.djangoproject.com/en/1.10/ref/contrib/postgres/search/#django.contrib.postgres.search.SearchVectorField