langcog / childes-db

A SQL interface for the CHILDES child language corpora
13 stars 5 forks source link

pushing childes-db from dev environment to EC2 -- schema issues & workaround #44

Open smeylan opened 5 years ago

smeylan commented 5 years ago

Since upgrading childes-db Django version / Python 3 updates, we get errors when trying to do a direct mysqldump from the data processing machine (Chompsky) to the remote (production/ EC2) server, e.g.,

mysqldump -u <local_mysql_user> -p<pass> --databases childes-db-dev | ssh -I <pem> <ec2_user>@<ec2_hostname> 'cat - | mysql -u <remote_mysql_user> -p<remote_mysql_password>' childes-db-dev

yields

ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '(6) NOT NULL, PRIMARY KEY (id)

After some digging, this is because of 1) a mismatch in MySQL database versions -- Chompsky uses 5.7 and the EC2 instance runs 5.5 and 2) that newer versions of Django track migrations at higher temporal resolutions than supported in 5.5, such that the version mismatch, while not a problem in creating the 2018 version, is now a problem in 2019 after upgrading Django. Specifically, Django now tries to use fractional seconds: https://stackoverflow.com/questions/52387080/datetime3-type-is-giving-syntax-error-in-mysql-5-5

The correct longer term solution is to upgrade the EC2 to MySQL 5.7. But this is nontrivial because we can't upgrade MySQL easily from 5.5 to 5.7 under RHEL because of systemd changes (i.e. churn in how Linux manages daemons). Starting with a new image --- and replicating our read-only permissions system and MySQL configuration tuning --- is probably easier, but I am holding off on that for now.

The easy way to get the data up is to dump the sql statements that create the tables and do the table population separately, following https://serverfault.com/questions/971730/mysqldump-5-7-to-5-5-compatibility. While there is a --compatibility flag for mysqldump, it is focused on the problem of migrating between versions 3, 4, and 5 of MySQL, and does not address this specific problem within different 5.x versions (lol@failed semantic versioning?)

So the procedure is instead:

1) On the data processing machine, output the table creation commands (--no-data means issue table creation commands, but don't send the table contents) mysqldump -u<local_mysql_user> -p<local_mysql_password> --databases childes-db-dev --no-data > ~/childes_db_schema.sql

2) In a text editor, open ~/childes_db_schema.sql and change datetime(6) to datetime

3) Copy the fixed table creation commands to the remote server scp -I <pem> ~/childes_db_schema.sql <ec2 user>@<ec2 hostname>:~/childes_db_schema.sql

3) Log in to the remote machine and create the tables mysql -u<remote user> -p<password> < ~/childes_db_schema.sql

4) Push over all of the data (--no-create-info means ignore table creation) mysqldump -u <local_mysql_user> -p<pass> --databases childes-db-dev | ssh -I <pem> <ec2_user>@<ec2_hostname> 'cat - | mysql -u <remote_mysql_user> -p<remote_mysql_password>' childes-db-dev

smeylan commented 5 years ago

Or, to make it a one liner

mysqldump -u <local_mysql_user> -p<pass> --databases childes-db-dev | sed -e "s/datetime(6)/datetime/" | ssh -I <pem> <ec2_user>@<ec2_hostname> ‘cat - | mysql -u <remote_mysql_user> -p<remote_mysql_password>' childes-db-dev