openedx / edx-platform

The Open edX LMS & Studio, powering education sites around the world!
https://openedx.org
GNU Affero General Public License v3.0
7.35k stars 3.85k forks source link

Trying deploying on Windows Azure and use SQL Azure instead of MySQL #3099

Closed louyihua closed 10 years ago

louyihua commented 10 years ago

Our team in Beihang University is trying to deploy edx platform on Windows Azure. As we have already successfully deployed the production stack on Windows Azure, it is only a single all-in-one virtual machine that does not use any other Azure's services. Now we are trying to use SQL Azure instead of MySQL. We first install the django-pyodbc-azure, freeTDS and unixODBC, then we try to use it to connect to SQL Azure. Then we find two issues with the pyodbc package:

  1. DSN must be used for successful connection. If we use the method listed on the package's website by direct assigning connection parameters in auth.json, the connection will fail because unixODBC will report no driver found. [This is not true and it only needs a right configuration, like that:]
"ENGINE": "sql_server.pyodbc",
"HOST": "servername.database.windows.net",
"NAME": "edxapp",
"PASSWORD": "password_here",
"PORT": "1433",
"USER": "azureuser@servername",
"OPTIONS": {
    "driver": "FreeTDS",
    "host_is_server": true,
    "extra_params": "TDS_Version=7.1"
}
  1. The pyodbc.py uses SQL Server's compatible views in the _find_indexes_for_column function and compatible names in the drop_column_default_sql function, which are not supported by SQL Azure. So we chage the SQL statement to the supported ones, as the following:
SELECT si.name, si.object_id, sik.column_id, sc.name
FROM sys.indexes SI WITH (NOLOCK)
INNER JOIN sys.index_columns SIK WITH (NOLOCK)
    ON  SIK.object_id = Si.object_id
    AND SIK.index_id = SI.index_id
INNER JOIN sys.columns SC WITH (NOLOCK)
    ON  SI.object_id = SC.object_id
    AND SIK.column_id = SC.column_id
WHERE SI.index_id !=0
    AND Si.object_id = OBJECT_ID('%s')
    AND SC.name = '%s'

and

SELECT object_name(default_object_id)
FROM sys.columns
WHERE object_id = object_id('%s')
AND name = '%s'

After fix the bugs here, we are trying to syncdb & migrate databases, and we encounter the following problems in the lms's migration process: (1). SQL Azure does not support the MySQL's syntax "CREATE TABLE ... AS SELECT ..." which is hard-coded in 0030_auto__chg_field_anonymoususerid_anonymous_user_id.py. As it does not support SQL Server's syntax 'SELECT INTO' either, we now use seperate hard-coded "CREATE TABLE" and "INSERT INTO" statements to solve this problem temporarily, as the following (the function LENGTH() is replaced by its SQL Server's equivalence LEN()):

db.execute("""
    CREATE TABLE student_anonymoususerid_temp_archive(
        id int NOT NULL PRIMARY KEY,
        user_id int NOT NULL,
        anonymous_user_id varchar(32) NOT NULL,
        course_id varchar(255) NOT NULL)
""")
db.execute("""
    INSERT INTO student_anonymoususerid_temp_archive(id, user_id, anonymous_user_id, course_id)
    SELECT id, user_id, anonymous_user_id, course_id FROM student_anonymoususerid WHERE LEN(anonymous_user_id) = 16
""")
db.execute("""
    DELETE FROM student_anonymoususerid
    WHERE LEN(anonymous_user_id) = 16
""")

[Submitted, pull request #3120 ]

(2). The migration process seems to put the django_notify step after the wiki step, but the wiki step actually requires tables created in the django_notify step. So we now can only first maunally run the django_notify step, then go to the wiki step. [Submitted, pull request 5 in edx/django-wiki]

(3). We then stucked at the step wallfe. The wallfe step requires a "alter column" clause for column "created" on tables "waffle_sample", "waffle_flag" and "waffle_switch". But SQL Azure reports that as there are indexes dependent on these columns, the column cannot be altered. Even more strange, when we put the same SQL statement into SQL Azure's management portal, it executes without any error! And our solution to this issue is first manually remove the three indexes when miragtion first fails, and re-added them after the migration process. [Submitted, pull request 1 in edx/django-waffle]

(4). Except the above, all the other steps seems to be fine with SQL Azure. The studio's migration process works fine too.

So, as it is more than just contributing code to make edx compatible with SQL Azure, how can we make our contribution useful to edx's users? @singingwolfboy How do you think?

cpennington commented 10 years ago

Hi. We added this particular migration to handle an issue where we had data truncation (because the anonymous_user_id column was too short). Those hand-coded queries were put it to preserve the data that was already in the tables. At this point, I think we can simply remove them from that migration.

jarv commented 10 years ago

@louyihua I'm not very familiar with Windows Azure, are you using their linux VM for running the app and if so are you using our configuration scripts in the configuration repo?

louyihua commented 10 years ago

@jarv Yes. At first I just move the production stack's VM image into Windows Azure, and it runs fine. So I wonder if the production stack can use the Windows Azure's SQL Database rather than MySQL in the VM. Then I use the configuration scripts in the configuration repo to perform a manual db sync and migration, and find the issues as above.

jarv commented 10 years ago

@louyihua I see, I was going to point out that there is a var you can unset if you want the plays to skip the migration step. It seems like you already figured that part out?

symbolist commented 10 years ago

If you do not want to fix all the old migrations, one possible way is:

  1. Remove south app from settings.
  2. Do syncdb. This will create the latest structure of the tables.
  3. Put back south app in settings.
  4. Run fake migrations for all the apps.

But to continue to run edx-platform on Azure SQL will require maintaining parallel versions of queries for all future migrations. Raw sql queries are also used in some places in the codebase so they will likely not work too.

louyihua commented 10 years ago

@jarv Yes, in fact I've already successfully maked a developer stack using SQL Azure and I have found solutions for all the issues I encountered in the migration step. I just think that if some of the issues which are caused by edx's code can be fixed so that in the future others can do minimal manual fix to make edx using SQL Azure.

singingwolfboy commented 10 years ago

@louyihua based on @cpennington's comment, it sounds like you may want to make a branch that removes that migration from the repository, correctly orders django_notify before wiki, and make any other changes you need to do to get the code to work on Azure. If you can get it to work, we'd love a pull request!

singingwolfboy commented 10 years ago

edX isn't prepared to support our code running on Microsoft Azure right now, so I'm closing this issue. You're free to maintain an Azure-compatible fork of the codebase if you want, though!