kanishka-linux / reminiscence

Self-Hosted Bookmark And Archive Manager
GNU Affero General Public License v3.0
1.74k stars 85 forks source link

Issues using MySQL backend #53

Open Leaguesman opened 3 years ago

Leaguesman commented 3 years ago

Hi,

I'm attempting some tinkering with Reminiscence, specifically switching the database backend to MySQL in settings.py, as I primarily use a MariaDB instance for a number of applications hosted on the same machine. However, there are some issues around character sets.

Using the MySQL default latin1 character set, the migrations work, and everything is mostly functional, however on occasion saving a URL will fail due to a Unicode character error, like this;

django.db.utils.OperationalError: (1366, "Incorrect string value: '\\xE2\\x80\\x90 A ...' for columnreminiscence.pages_library.summaryat row 1")

Dropping that database and trying again with utf8mb4 character set (which is MySQL's implementation of standard utf8) is worse, as the migrations fail with a row too large error (1118). Checking through them I can see that some of your Varchar fields are quite large (4096 or more), but after doing some tinkering I can't quite pinpoint the problem.

Do you have any insight on the issue?

Leaguesman commented 3 years ago

Well in case anyone is interested in doing something similar, I've managed to massage down the schema from my existing Reminiscence database to be MySQL compliant in the utf8mb4 character set. This was done by reducing some of the larger field sizes that (in most use cases) wouldn't need to be that large.

This means the application is now functional, and the Unicode errors I was seeing earlier are no longer occurring. I haven't changed the actual migration though, so this still wouldn't work for a fresh install.

Leaguesman commented 3 years ago

Hmm, I'm actually still getting character errors, though far fewer of them. Seems like it's now happening for much larger characters, like so;

django.db.utils.OperationalError: (1366, "Incorrect string value: '\\xF0\\x9F\\xA4\\x94\\xF0\\x9F...' for columnreminiscence.pages_library.summaryat row 1")

I've tried setting the connection and collation settings in my Django database settings as such;

'OPTIONS': { 'charset' : 'utf8mb4', 'use_unicode' : True, 'init_command': 'SET ' 'storage_engine=INNODB,' 'character_set_connection=utf8mb4,' 'collation_connection=utf8_unicode_ci' },

This hasn't resolved the issues though. Perhaps I'm looking in the wrong place, and the issue is in the summary extraction...

kanishka-linux commented 3 years ago

I don't have enough experience with mysql but something like

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'my_db',
        'USER': 'my_user',
        'PASSWORD': 'my_pass',
        'HOST': 'my.host',
        'OPTIONS': {
            'charset': 'utf8mb4'
        }
    }
}

in setting.py should work, and maybe charset of the entire db/table or column also need to be set to the utf8mb4.

thprice commented 3 years ago

It also heavily depends on the exact version of MariaDB/MySQL you're using. Try to upgrade to MariaDB 10.6.