the-paperless-project / paperless

Scan, index, and archive all of your paper documents
GNU General Public License v3.0
7.85k stars 498 forks source link

postgresql: django.db.utils.ProgrammingError: relation "documents_document" does not exist #397

Open apiontek opened 6 years ago

apiontek commented 6 years ago

Paperless version: 2.2.1

Hi, I had paperless working fine with sqlite, but I'd prefer to use postgresql or mariadb, both which I have installed. I see a previous issue with someone trying to use mariadb, so I figured I'd try postgresql.

I created a user, and a database with CREATE DATABASE paperless OWNER paperless;

I set, in paperless.conf:

#### DATABASE
PAPERLESS_DBUSER=paperless
PAPERLESS_DBPASS=sup3rs3cr3t

I try manage.py migrate and get an auth error. I can fix the auth error by edding settings.py and adding 'HOST': '127.0.0.1' to the DATABASES["default"] dictionary

However, then I get the following error running manage.py migrate:

/tank/srvdata/paperless/tmp/.venv/lib/python3.6/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
  """)
Traceback (most recent call last):
  File "/tank/srvdata/paperless/tmp/.venv/lib/python3.6/site-packages/django/db/backends/utils.py", line 85, in _execute
    return self.cursor.execute(sql, params)
psycopg2.ProgrammingError: relation "documents_document" does not exist
LINE 1: ..."storage_type", "documents_document"."added" FROM "documents...
                                                             ^

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "./manage.py", line 11, in <module>
    execute_from_command_line(sys.argv)
  File "/tank/srvdata/paperless/tmp/.venv/lib/python3.6/site-packages/django/core/management/__init__.py", line 371, in execute_from_command_line
    utility.execute()
  File "/tank/srvdata/paperless/tmp/.venv/lib/python3.6/site-packages/django/core/management/__init__.py", line 365, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/tank/srvdata/paperless/tmp/.venv/lib/python3.6/site-packages/django/core/management/base.py", line 288, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/tank/srvdata/paperless/tmp/.venv/lib/python3.6/site-packages/django/core/management/base.py", line 332, in execute
    self.check()
  File "/tank/srvdata/paperless/tmp/.venv/lib/python3.6/site-packages/django/core/management/base.py", line 364, in check
    include_deployment_checks=include_deployment_checks,
  File "/tank/srvdata/paperless/tmp/.venv/lib/python3.6/site-packages/django/core/management/commands/migrate.py", line 58, in _run_checks
    issues.extend(super()._run_checks(**kwargs))
  File "/tank/srvdata/paperless/tmp/.venv/lib/python3.6/site-packages/django/core/management/base.py", line 351, in _run_checks
    return checks.run_checks(**kwargs)
  File "/tank/srvdata/paperless/tmp/.venv/lib/python3.6/site-packages/django/core/checks/registry.py", line 73, in run_checks
    new_errors = check(app_configs=app_configs)
  File "/tank/srvdata/paperless/tmp/paperless/src/documents/checks.py", line 16, in changed_password_check
    storage_type=Document.STORAGE_TYPE_GPG).first()
  File "/tank/srvdata/paperless/tmp/.venv/lib/python3.6/site-packages/django/db/models/query.py", line 604, in first
    for obj in (self if self.ordered else self.order_by('pk'))[:1]:
  File "/tank/srvdata/paperless/tmp/.venv/lib/python3.6/site-packages/django/db/models/query.py", line 272, in __iter__
    self._fetch_all()
  File "/tank/srvdata/paperless/tmp/.venv/lib/python3.6/site-packages/django/db/models/query.py", line 1179, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "/tank/srvdata/paperless/tmp/.venv/lib/python3.6/site-packages/django/db/models/query.py", line 53, in __iter__
    results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size)
  File "/tank/srvdata/paperless/tmp/.venv/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 1068, in execute_sql
    cursor.execute(sql, params)
  File "/tank/srvdata/paperless/tmp/.venv/lib/python3.6/site-packages/django/db/backends/utils.py", line 100, in execute
    return super().execute(sql, params)
  File "/tank/srvdata/paperless/tmp/.venv/lib/python3.6/site-packages/django/db/backends/utils.py", line 68, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "/tank/srvdata/paperless/tmp/.venv/lib/python3.6/site-packages/django/db/backends/utils.py", line 77, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/tank/srvdata/paperless/tmp/.venv/lib/python3.6/site-packages/django/db/backends/utils.py", line 85, in _execute
    return self.cursor.execute(sql, params)
  File "/tank/srvdata/paperless/tmp/.venv/lib/python3.6/site-packages/django/db/utils.py", line 89, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/tank/srvdata/paperless/tmp/.venv/lib/python3.6/site-packages/django/db/backends/utils.py", line 85, in _execute
    return self.cursor.execute(sql, params)
django.db.utils.ProgrammingError: relation "documents_document" does not exist
LINE 1: ..."storage_type", "documents_document"."added" FROM "documents...
                                                             ^

I'll go back to working with sqlite for now, but I'm wondering if this is supposed to be working? psycopg2 is ==2.7.5 and I tried it with removing that and installing psycopg2-binary as well, but I get the same error.

danielquinn commented 6 years ago

It sounds like you're doing everything right, though if your PostgreSQL server is configured to allow connections on a Unix socket, you shouldn't need to set the HOST value.

I think your error might be related to #396, so now that that's been fixed, perhaps that'll make things work for you as well. Just git pull for the update.

If that doesn't do it, I'd suggest checking that the database isn't already half-created as part of this experimentation. Drop the database and re-create it, and then try to re-run migrate.

apiontek commented 6 years ago

No time right now to investigate the Unix socket possibility, though I do see a socket in my /run/postgresql (I'm on Ubuntu 18.04). I have a couple other services using the same postgresql (mastodon, pleroma, synapse), but I don't know if they use a socket.

In any case, adding the HOST value works to get started, but after updating to latest commit tag 2.3.0, I'm still getting stuck, but the error is different. More progress is made:

Operations to perform:
  Apply all migrations: admin, auth, contenttypes, documents, reminders, sessions
Running migrations:
  Applying contenttypes.0001_initial... OK
  Applying auth.0001_initial... OK
  Applying admin.0001_initial... OK
  Applying admin.0002_logentry_remove_auto_add... OK
  Applying contenttypes.0002_remove_content_type_name... OK
  Applying auth.0002_alter_permission_name_max_length... OK
  Applying auth.0003_alter_user_email_max_length... OK
  Applying auth.0004_alter_user_username_opts... OK
  Applying auth.0005_alter_user_last_login_null... OK
  Applying auth.0006_require_contenttypes_0002... OK
  Applying auth.0007_alter_validators_add_error_messages... OK
  Applying auth.0008_alter_user_username_max_length... OK
  Applying auth.0009_alter_user_last_name_max_length... OK
  Applying documents.0001_initial... OK
  Applying documents.0002_auto_20151226_1316... OK
  Applying documents.0003_sender... OK
  Applying documents.0004_auto_20160114_1844... OK
  Applying documents.0005_auto_20160123_0313... OK
  Applying documents.0006_auto_20160123_0430... OK
  Applying documents.0007_auto_20160126_2114... OK
  Applying documents.0008_document_file_type... OK
  Applying documents.0009_auto_20160214_0040... OK
  Applying documents.0010_log... OK
  Applying documents.0011_auto_20160303_1929... OK
  Applying documents.0012_auto_20160305_0040... OK
  Applying documents.0013_auto_20160325_2111... OK
  Applying documents.0014_document_checksum...Traceback (most recent call last):
  File "/tank/srvdata/paperless/.venv/lib/python3.6/site-packages/django/db/backends/utils.py", line 85, in _execute
    return self.cursor.execute(sql, params)
psycopg2.ProgrammingError: relation "documents_document_checksum_75209391_like" already exists

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "src/manage.py", line 11, in <module>
    execute_from_command_line(sys.argv)
  File "/tank/srvdata/paperless/.venv/lib/python3.6/site-packages/django/core/management/__init__.py", line 371, in execute_from_command_line
    utility.execute()
  File "/tank/srvdata/paperless/.venv/lib/python3.6/site-packages/django/core/management/__init__.py", line 365, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/tank/srvdata/paperless/.venv/lib/python3.6/site-packages/django/core/management/base.py", line 288, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/tank/srvdata/paperless/.venv/lib/python3.6/site-packages/django/core/management/base.py", line 335, in execute
    output = self.handle(*args, **options)
  File "/tank/srvdata/paperless/.venv/lib/python3.6/site-packages/django/core/management/commands/migrate.py", line 200, in handle
    fake_initial=fake_initial,
  File "/tank/srvdata/paperless/.venv/lib/python3.6/site-packages/django/db/migrations/executor.py", line 117, in migrate
    state = self._migrate_all_forwards(state, plan, full_plan, fake=fake, fake_initial=fake_initial)
  File "/tank/srvdata/paperless/.venv/lib/python3.6/site-packages/django/db/migrations/executor.py", line 147, in _migrate_all_forwards
    state = self.apply_migration(state, migration, fake=fake, fake_initial=fake_initial)
  File "/tank/srvdata/paperless/.venv/lib/python3.6/site-packages/django/db/migrations/executor.py", line 244, in apply_migration
    state = migration.apply(state, schema_editor)
  File "/tank/srvdata/paperless/.venv/lib/python3.6/site-packages/django/db/backends/base/schema.py", line 106, in __exit__
    self.execute(sql)
  File "/tank/srvdata/paperless/.venv/lib/python3.6/site-packages/django/db/backends/base/schema.py", line 133, in execute
    cursor.execute(sql, params)
  File "/tank/srvdata/paperless/.venv/lib/python3.6/site-packages/django/db/backends/utils.py", line 100, in execute
    return super().execute(sql, params)
  File "/tank/srvdata/paperless/.venv/lib/python3.6/site-packages/django/db/backends/utils.py", line 68, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "/tank/srvdata/paperless/.venv/lib/python3.6/site-packages/django/db/backends/utils.py", line 77, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/tank/srvdata/paperless/.venv/lib/python3.6/site-packages/django/db/backends/utils.py", line 85, in _execute
    return self.cursor.execute(sql, params)
  File "/tank/srvdata/paperless/.venv/lib/python3.6/site-packages/django/db/utils.py", line 89, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/tank/srvdata/paperless/.venv/lib/python3.6/site-packages/django/db/backends/utils.py", line 85, in _execute
    return self.cursor.execute(sql, params)
django.db.utils.ProgrammingError: relation "documents_document_checksum_75209391_like" already exists
danielquinn commented 6 years ago

Interesting. Looking at that file, there's only one weird thing we're doing, and I'm curious about what would happen if you were to skip that part.

  1. We create a new column in the documents table called checksum with a default value of -.
  2. Then we generate a new checksum for all docs in the system and populate that field with the checksum.
  3. Finally we modify the column to have no default, NOT NULL and a unique constraint.

I'm guessing the problem is on step 3, which is done in these few lines. If you were to remove those lines from the file, does the migration work?

apiontek commented 6 years ago

Yes - if I remove 161-165, migrate works.

However, I then tried to import my small corpus of 27 pdfs, and got the below error & trace:

(.ppvenv) root@paperless-testing:/opt/paperless# ./src/manage.py document_importer /opt/paperless/export_20180922
Traceback (most recent call last):
  File "/opt/.ppvenv/lib/python3.6/site-packages/django/db/backends/utils.py", line 85, in _execute
    return self.cursor.execute(sql, params)
psycopg2.OperationalError: index row size 5400 exceeds maximum 2712 for index "documents_document_content_aa150741"
HINT:  Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text indexing.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "./src/manage.py", line 11, in <module>
    execute_from_command_line(sys.argv)
  File "/opt/.ppvenv/lib/python3.6/site-packages/django/core/management/__init__.py", line 371, in execute_from_command_line
    utility.execute()
  File "/opt/.ppvenv/lib/python3.6/site-packages/django/core/management/__init__.py", line 365, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/opt/.ppvenv/lib/python3.6/site-packages/django/core/management/base.py", line 288, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/opt/.ppvenv/lib/python3.6/site-packages/django/core/management/base.py", line 335, in execute
    output = self.handle(*args, **options)
  File "/opt/paperless/src/documents/management/commands/document_importer.py", line 51, in handle
    call_command("loaddata", manifest_path)
  File "/opt/.ppvenv/lib/python3.6/site-packages/django/core/management/__init__.py", line 141, in call_command
    return command.execute(*args, **defaults)
  File "/opt/.ppvenv/lib/python3.6/site-packages/django/core/management/base.py", line 335, in execute
    output = self.handle(*args, **options)
  File "/opt/.ppvenv/lib/python3.6/site-packages/django/core/management/commands/loaddata.py", line 72, in handle
    self.loaddata(fixture_labels)
  File "/opt/.ppvenv/lib/python3.6/site-packages/django/core/management/commands/loaddata.py", line 113, in loaddata
    self.load_label(fixture_label)
  File "/opt/.ppvenv/lib/python3.6/site-packages/django/core/management/commands/loaddata.py", line 177, in load_label
    obj.save(using=self.using)
  File "/opt/.ppvenv/lib/python3.6/site-packages/django/core/serializers/base.py", line 205, in save
    models.Model.save_base(self.object, using=using, raw=True, **kwargs)
  File "/opt/.ppvenv/lib/python3.6/site-packages/django/db/models/base.py", line 759, in save_base
    updated = self._save_table(raw, cls, force_insert, force_update, using, update_fields)
  File "/opt/.ppvenv/lib/python3.6/site-packages/django/db/models/base.py", line 842, in _save_table
    result = self._do_insert(cls._base_manager, using, fields, update_pk, raw)
  File "/opt/.ppvenv/lib/python3.6/site-packages/django/db/models/base.py", line 880, in _do_insert
    using=using, raw=raw)
  File "/opt/.ppvenv/lib/python3.6/site-packages/django/db/models/manager.py", line 82, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/opt/.ppvenv/lib/python3.6/site-packages/django/db/models/query.py", line 1125, in _insert
    return query.get_compiler(using=using).execute_sql(return_id)
  File "/opt/.ppvenv/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 1285, in execute_sql
    cursor.execute(sql, params)
  File "/opt/.ppvenv/lib/python3.6/site-packages/django/db/backends/utils.py", line 100, in execute
    return super().execute(sql, params)
  File "/opt/.ppvenv/lib/python3.6/site-packages/django/db/backends/utils.py", line 68, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "/opt/.ppvenv/lib/python3.6/site-packages/django/db/backends/utils.py", line 77, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/opt/.ppvenv/lib/python3.6/site-packages/django/db/backends/utils.py", line 85, in _execute
    return self.cursor.execute(sql, params)
  File "/opt/.ppvenv/lib/python3.6/site-packages/django/db/utils.py", line 89, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/opt/.ppvenv/lib/python3.6/site-packages/django/db/backends/utils.py", line 85, in _execute
    return self.cursor.execute(sql, params)
django.db.utils.OperationalError: Problem installing fixture '/opt/paperless/export_20180922/manifest.json': Could not load documents.Document(pk=10): index row size 5400 exceeds maximum 2712 for index "documents_document_content_aa150741"
HINT:  Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text indexing.

In this testing container, I then commented out the db user & pass in paperless.conf, did migrate and createsuperuser, and tried the same import with a fresh sqlite database. It worked fine with sqlite.

I'm happy to share the export manifest.json I used if it helps! I had just done the export today from my working sqlite instance.

danielquinn commented 6 years ago

Ok, I've solved it! I took some time today to try this out on my own local Postgres instance and sure enough, I had the same error come up.

So first thing's first, the error you get during the initial migrate shouldn't happen any more, as it was directly related to the way Django runs migrations in a transaction. I moved that last AlterModel block into the following migration and now everything runs just fine.

As for that second traceback, it's a little tougher. The problem is that the table is created with db_index=True on the content column, which works just fine for Sqlite, but barfs on Postgres 'cause it wants me to make it use full-text or something. Honestly I don't know how to do that so it works with both Sqlite & Postgres, so here's what you have to do (until we come up with a more elegant solution):

  1. Create your database
  2. Run manage.py migrate to generate the tables
  3. Run manage.py dbshell to get a Postgres shell and run these two commands:
    DROP INDEX documents_document_content_aa150741_like;
    DROP INDEX documents_document_content_aa150741;
  4. Run your import as usual

This will drop the indexes, making searches slower, but so long as you're not dealing with a crazy-big database you probably won't even notice. Also note that I'm not 100% confident that the index names will be the same on every system. If they aren't, you'll have to do \d documents_document and look for two indexes matching the pattern documents_document_content_* and just nuke those.

wiwie commented 5 years ago

having the same issue. works as a workaround for now.

danielquinn commented 5 years ago

This may be related to #477.