Open GoogleCodeExporter opened 9 years ago
This sounds like it will be a giant pain to fix. Although (as of last night)
all database servers are MySQL 5.5.3+, it looks like we will have to alter
every last CHAR, VARCHAR, and TEXT column in the database, as well as change
the sizes to compensate? In addition, we need to ensure MySQL always connects
to its database server using the right connection charset (WTF, MySQL? This
shouldn't be that hard!)
So, at the very minimum, we need:
1. A migration to move the database, every table, and every [var]char|text
column to a mb4 encoding and collation. (what's the difference between the
utf8mb4_unicode_ci, utf8mb4_general_ci utf8mb4_bin collations?
https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-sets.html is somewhat
less than clear)
2. Updates to the model files (possibly? no collation is listed in them?) at
http://code.google.com/p/grinnellplans/source/browse/trunk/db/models/generated
to maybe fix size problems?
3. A new version of the DB schema at
http://code.google.com/p/grinnellplans/source/browse/trunk/documents/db-schema
, to aid anyone keeping track at home.
4. Something in the database connection init to enable mb4.
Am I missing something?
Original comment by a...@alexcohn.com
on 7 Jan 2013 at 8:55
Update on my #4: Database connections are created in at least four different
places in the code:
1. When Doctrine is initialized, in bootstrap.php line 13
2. in inc/Database.php, which itself is only used in inc/User.php,
anonymous.php (secrets), and kommand/style-stats.php. Let's refactor these uses
to Doctrine and delete inc/Database entirely.
3. in dbfunctions.php's db_connect() function, which is used all over the
place.
4. in functions-database.php's db_connect() function. As far as I can tell,
functions-database.php is included absolutely nowhere. I deleted it on my test
server, and nothing broke. I am pretty sure this is orphaned code.
Original comment by a...@alexcohn.com
on 7 Jan 2013 at 9:39
Really nice find, Tom.
Alex, you don't necessarily need to alter every string column in the DB - only
columns that accept user input. And I imagine really it would be good enough to
fix only the `plans.edit_text` and `plans.plan`, and maybe `accounts.pseudo`
for good measure.
If the upgrade procedure proves too hideous, seems like we could also come up
with something like
http://stackoverflow.com/questions/3220031/how-to-filter-or-replace-unicode-char
acters-that-would-take-more-than-3-bytes?lq=1 to filter these characters before
they hit the DB - easy enough to do through Doctrine. Not quite as nice as
fully supporting these characters, but it would prevent data loss, which is the
most important part.
Original comment by ian.gree...@gmail.com
on 7 Jan 2013 at 9:51
This issue was updated by revision r621.
Original comment by a...@alexcohn.com
on 16 Jan 2013 at 2:13
This appears to be fixed on the production server (which is awesome), but it
still doesn't work for me on my local copy. Are the changes in r621 all you
did, Alex, or did you also run some conversion queries on the production
database?
Original comment by umhecbaa
on 13 Sep 2013 at 5:39
edit_text and plan (along with a couple other columns) are now utf8mb4, not
utf8, on prod. I'll post a sanitized database dump this PM.
Original comment by a...@alexcohn.com
on 13 Sep 2013 at 5:45
Here's a tables-only database dump (don't restore from this - it's missing
things like the data for interfaces and system).
(Also, all tables are InnoDB. We're not using transactions or any of the other
InnoDB cool stuff, but it's purportedly more crashsafe, which on EC2 is a good
thing)
Original comment by a...@alexcohn.com
on 13 Sep 2013 at 5:55
Attachments:
Thanks for the database dump, Alex. By diffing against a structure-only dump
from my development database I managed to figure out how to convert it myself.
Attached are the queries I made. The first script moves to InnoDB, the second
converts the 3 most important columns to utf8mb4.
I suppose these could be rewritten as doctrine migrations but since the queries
(or something similar) have already been run manually on production I did not
want to create confusion.
4-byte unicode characters now work in plans, but not yet for plan names.
Production plans seems to store the character ok (when I return to the 'change
name' form, I can retrieve the text I entered) but it displays as mojibake in
read.php (check my current plan name for an example).
I was hoping to work on this tonight now that my schema matches production, but
my local server is failing in a different mode. It seems to store the character
as '????' in the database. Did I miss something while converting?
Original comment by umhecbaa
on 23 Sep 2013 at 3:20
Attachments:
Original issue reported on code.google.com by
umhecbaa
on 7 Jan 2013 at 4:31