webcompat / webcompat-metrics-server

Server in charge of delivering different data to the webcompat-metrics-client
Mozilla Public License 2.0
6 stars 5 forks source link

[db] Create migration scripts for DB #69

Open karlcow opened 6 years ago

karlcow commented 6 years ago

Soon or later, migrating the DB will be required if we do not want to loose any data.

Alembic is a module using SQLAlchemy that we already use to manage migration. https://alembic.zzzcomputing.com/en/latest/tutorial.html

karlcow commented 6 years ago

Important things to remember.

fyi @laghee

karlcow commented 6 years ago

Good run down of things to do for migration including remote migration on Heroku

karlcow commented 6 years ago

Also Setting up flask app in heroku with a database

laghee commented 5 years ago

I've been looking at my weekly count data backup to see what the best migration plan might be. Since I have a ghbackup script running daily to update all new issues filed, I have all the web-bugs issues stored on my server. Another script runs to update the weekly count json response for the endpoint. I wonder if we really need to do a full-blown migration at this point?

It looks like I can pretty easily set up a postgres DB on pythonanywhere and run another script to load that up from the stored issue data. (My current data is a weekly count updated daily -- in a flagrantly terrible redundant way... I'm basically recounting all the issues every time 🙄.) Then I could dump that locally, run the migration, and push it to the heroku db.

But I wonder if it might be simpler to write a temp script that makes a request to the existing apis for the older data and directly adds to the database? Obviously, we'll need to set up Alembic anyway in order to manage any migrations going forward, but I'm not sure it's necessary for this first step.

@karlcow How are you currently storing the needsdiagnosis timeline on your api? Is it a json file that your script appends new data to, or do you have some kind of DB set up?

karlcow commented 5 years ago

@karlcow How are you currently storing the needsdiagnosis timeline on your api? Is it a json file that your script appends new data to, or do you have some kind of DB set up?

no DB a json file. The core part looks like this:

def main():
    """Core program."""
    # Extract data from GitHub
    url = urljoin(URL_REPO, NEEDSDIAGNOSIS)
    json_response = get_remote_file(url)
    # take only what we need
    open_issues = extract_open_issues(json_response)
    now = newtime(datetime.datetime.now().isoformat(timespec='seconds'))
    # create an individual record
    data = '{now} {open_issues}'.format(now=now, open_issues=open_issues)
    # save it in a file
    with open(FILEPATH, 'a') as f:
        f.write('{data}\n'.format(data=data))
    # Convert Data to JSON.
    converted = convert(txt_data_path)
    timeline = {'about': 'Hourly NeedsDiagnosis issues count', 'date_format': 'w3c'}
    timeline['timeline'] = converted
    # save the full file.
    with open(json_data_path, 'w') as f:
        f.write(json.dumps(timeline, sort_keys=True, indent=2))
   # backup dance
    copy2(json_timeline, json_timeline_bkp)
    copy2(json_data_path, json_timeline)
karlcow commented 5 years ago

which is not optimum at all.

But I wonder if it might be simpler to write a temp script that makes a request to the existing apis for the older data and directly adds to the database?

Probably.

karlcow commented 5 years ago

So to remember here. Every tutorial about flask + DB + Heroku says the same thing.

Run LOCALLY

Then commit the results to the local git repo.

Run on HEROKU

heroku run flask db upgrade

The reason is that there is no filesystem on heroku.

karlcow commented 5 years ago

This should be useful too. https://devcenter.heroku.com/articles/heroku-postgres-import-export

laghee commented 5 years ago

Heeeeeeelllllllllp. I decided that it was scandalous that I let this issue sit here so long, so I decided to jump in and tackle it, but it's driving me nuts.

What I thought was going to be a simple "add a manage.py and voila!" exercise has devolved into me watching my brains drip out of my ear. I suspect that something has gone wonky with Flask-Script because when I follow the instructions in the two main tutorials (https://realpython.com/flask-by-example-part-2-postgres-sqlalchemy-and-alembic/ -- 5 yrs. old and https://gist.github.com/mayukh18/2223bc8fc152631205abd7cbf1efdd41/ -- 2 yrs. old), like so:

# manage.py

"""Management of database migrations through Alembic/Flask-Migrate."""

from flask_script import Manager
from flask_migrate import Migrate, MigrateCommand

import ochazuke
from ochazuke import db

migrate = Migrate(ochazuke, db)
manager = Manager(ochazuke)

manager.add_command("db", MigrateCommand)

if __name__ == "__main__":
    manager.run()

...and then try to run the init command, I get:

Screenshot 2019-11-17 20 25 16

Hmmm. It occurs to me that maybe this is another app factory problem... Instead of ochazuke, I need to be using create_app... let's see ...

from ochazuke import create_app
from ochazuke import db

application = create_app()
migrate = Migrate(application, db)
manager = Manager(application)

manager.add_command("db", MigrateCommand)

if __name__ == "__main__":
    manager.run()

OK, slightly better (I guess?) error: Screenshot 2019-11-18 00 34 19

So since we're just running the one file with no app context, I need to give it a config_name. Not sure what's best to use. Could be, I should be pushing an app context here directly or something, but for now let's try "default" just to see if it works...

Screenshot 2019-11-18 00 37 34

Well, hot damn. 🎉

OK, this still needs some tweaks before pushing anything to Heroku. It should definitely have an app context that works in production, too -- "default" defaults to "development," which obviously isn't going to work live.

From what I understand, I can either add the app context into manage.py or use the newer method with the Flask Click cli rather than Flask-Script as shown here (and also here), and just add the migration object into ochazuke/__init__.py (might be simpler, actually).

Thoughts, @karlcow?

karlcow commented 5 years ago

@laghee understood. I put that on the todo list for wednesday. Thanks for the heads up.

laghee commented 5 years ago

@karlcow Sleeping on it helped, and I was able to go back and successfully redo a migration init using the Flask Click cli setup, which seems cleaner and won't leave us relying on an unmaintained tool into the future.

I'm pushing the changes to __init.py__, and once I have the best way to access the current config (instead of using a dummy "development" config), I can rerun the migrations locally and then on Heroku.

karlcow commented 5 years ago

That's super good news. :)